Good evening everyone,
I am starting to work with Qiime2 in these days and untill now I found it is pretty easy to use but I am getting a particular problem. Starting from fastq I have imported them and performed dada2 analyses and then the taxonomic assignation using greengenes dataset. Then, trying to check how many taxa have been recognized, I did a barplot qzv to see it on qiime2 view web page and I discovered two errors: 1) some subjects did not have even 1 taxa ; 2) other samples have very few taxa recognized. On the same dataset I performed qiime1 closed reference and I get a completely different results with many species recognized. Thus, I think I did some mistake in my qiime 2 analyses but i dont understand in which point. Can someone help me to understand wher I am wrong? I am working with MiSeq V3-V4 data and this is my code:
source activate qiime2-2018.11
##Preparare i Raw Data
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path Run6/Raw \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
qiime demux summarize \
--i-data Run6/demux-paired-end.qza \
qiime dada2 denoise-paired \
--i-demultiplexed-seqs Run6/demux-paired-end.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 220 \
--p-trunc-len-r 200 \
--o-representative-sequences Run6/rep-seqs.qza \
--o-table Run6/table.qza \
--output-dir Dada2Out2 #crea nuova directory
#Taxonomy with Greengenes
wget -O "gg-13-8-99-515-806-nb-classifier.qza" "https://data.qiime2.org/2018.2/common/gg-13-8-99-515-806-nb-classifier.qza"
qiime feature-classifier classify-sklearn \
--i-classifier gg-13-8-99-515-806-nb-classifier.qza \
--i-reads Run5/rep-seqs.qza \
qiime metadata tabulate \
--m-input-file Run5/taxonomy.qza \
qiime taxa barplot \
--i-table Run5/table.qza \
--i-taxonomy Run5/taxonomy.qza \
--m-metadata-file Run5/MappingfileRun5.txt \
Poor taxonomic classification is almost always a simple user error: the wrong classifier is used for the query sequences. In your case I think I know the problem:
By any chance did you try to classify with one of the pre-trained V4 classifiers? This does not cover V3-V4 and so will perform poorly. You will need to use either the full-length 16S rRNA gene pre-trained classifiers or train your own classifier.
If you are sure you used an appropriate classifier (e.g., full-length 16S), then you probably have a lot of non-target DNA (e.g., host DNA) in your sequences. You can use NCBI BLAST of an unclassified sequence to confirm that’s the issue.
thanks a lot for you help!!
As you can see I have used “gg-13-8-99-515-806-nb-classifier.qza” cause I found it on a tutorial online.
Where can I get a better classifier as the one you were suggesting me?
Can you please give me more indication about how to get it?
Thanks a lot
I see that now — that’s the problem!
We host several pre-trained classifiers here. You can download the full-length classifier there.
I have made a try with Greengenes 13_8 99% OTUs full-length sequences
but I get more or less the same result to be truth…is it the right database?
Still thank you for your help
The full-length sequences should work for V3-V4… perhaps you should share your barplot.qzv here so I can see what you mean but more or less the same.
You could also try training your own V3-V4 classifier — it is time consuming but tailored to your specific amplicon so should give the best results.
Of course I can share it.
Those white spaces are not due to poor classification at all. Those samples evidently do not contain any sequences, and hence have no taxa or relative abundances. Those samples probably had low sequence counts, and all sequences were filtered out at some earlier step. You can use
qiime feature-table summarize on your feature table to confirm — but at the end of the day this does not look like a problem with classification.
OK thanks a lot @Nicholas_Bokulich for your precious support
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.