I am starting to work with Qiime2 in these days and untill now I found it is pretty easy to use but I am getting a particular problem. Starting from fastq I have imported them and performed dada2 analyses and then the taxonomic assignation using greengenes dataset. Then, trying to check how many taxa have been recognized, I did a barplot qzv to see it on qiime2 view web page and I discovered two errors: 1) some subjects did not have even 1 taxa ; 2) other samples have very few taxa recognized. On the same dataset I performed qiime1 closed reference and I get a completely different results with many species recognized. Thus, I think I did some mistake in my qiime 2 analyses but i dont understand in which point. Can someone help me to understand wher I am wrong? I am working with MiSeq V3-V4 data and this is my code:
Hi @michele_quail,
Poor taxonomic classification is almost always a simple user error: the wrong classifier is used for the query sequences. In your case I think I know the problem:
By any chance did you try to classify with one of the pre-trained V4 classifiers? This does not cover V3-V4 and so will perform poorly. You will need to use either the full-length 16S rRNA gene pre-trained classifiers or train your own classifier.
If you are sure you used an appropriate classifier (e.g., full-length 16S), then you probably have a lot of non-target DNA (e.g., host DNA) in your sequences. You can use NCBI BLAST of an unclassified sequence to confirm that's the issue.
thanks a lot for you help!!
As you can see I have used "gg-13-8-99-515-806-nb-classifier.qza" cause I found it on a tutorial online.
Where can I get a better classifier as the one you were suggesting me?
Can you please give me more indication about how to get it?
Thanks a lot
Hi @michele_quail,
The full-length sequences should work for V3-V4... perhaps you should share your barplot.qzv here so I can see what you mean but more or less the same.
You could also try training your own V3-V4 classifier — it is time consuming but tailored to your specific amplicon so should give the best results.
Hi @michele_quail,
Those white spaces are not due to poor classification at all. Those samples evidently do not contain any sequences, and hence have no taxa or relative abundances. Those samples probably had low sequence counts, and all sequences were filtered out at some earlier step. You can use qiime feature-table summarize on your feature table to confirm — but at the end of the day this does not look like a problem with classification.