Dear Qiimers,
I am new to using this amazing tool. My current goal with this tool is to remove host genes (as shown here Evaluating and controlling data quality with q2-quality-control — QIIME 2 2024.2.0 documentation) from my data. My main issue (..or rather my misconception) is that the quality-control exclude-seqs
function requires the --i-query-sequences
parameter to be a FeatureData[Sequence]
. I am confused as to how I created this specific type of artifact based on my workflow. I understand from this post https://forum.qiime2.org/t/create-featuredata-sequence/4913, the dionise step can achieve this, but I feel it makes more sense to remove host genes, prior to this.... let me know if this is an incorrect assessment please.
Here is my current working code and I would appreciate if any expert qiimers could identify where I am going wrong.
#set dir
script_directory = "/Home/"
os.chdir(script_directory)
#import (my import data is paired so my columns are sample-id, forward-full-filepath, reverse-full-filepath)
!qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path qiime_input/manifest2.txt \
--output-path qiime_input/paired_end_demux.qza \
--input-format PairedEndFastqManifestPhred33V2
#perform initial QC
!qiime demux summarize \
--i-data qiime_input/paired_end_demux.qza \
--o-visualization qiime_input/paired_end_demux.qzv
#use join-pairs to merge paired ends
!qiime vsearch merge-pairs \
--i-demultiplexed-seqs qiime_input/pe_demux.qza \
--o-merged-sequences qiime_input/merged_pe_demux.qza \
--o-unmerged-sequences qiime_input/unmerged_pe_demux.qza
#create host gene sequence file
!qiime tools import \
--input-path qiime_input/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.dna.toplevel.fa \
--output-path qiime_input/ref_sequences.qza \
--type 'FeatureData[Sequence]'
#remove host genes from qiime seqs
!qiime quality-control exclude-seqs \
--i-query-sequences qiime_input/merged_pe_demux.qza \
--i-reference-sequences qiime_input/ref_sequences.qza \
--p-method blast \
--p-perc-identity 0.97 \
--p-perc-query-aligned 0.97 \
--o-sequence-hits qiime_input/hits97.qza \
--o-sequence-misses qiime_input/misses97.qza
I am consistently getting this error - which makes me think I am getting something very wrong in this pre-processing pipeline. How can I get to the FeatureData[Sequence] which is required for host removal using qiime2.
There was a problem with the command:
(1/1) Invalid value for '--i-query-sequences': Expected an artifact of at
least type FeatureData[Sequence]. An artifact of type
SampleData[PairedEndSequencesWithQuality] was provided.
Many Thanks for any help,
Krutik