Train you classifier - taxonomy

Nicholas_Bokulich · March 14, 2018, 12:52pm

See this post. Depending on the length of your sequences, the primers you are using, etc, deeper than order level might be unreliable. On 250 bp 16S rRNA gene sequences, we can usually get genus level and some species with a good degree of reliability. The naive bayes classifier in QIIME2 is designed to be cautious and reliable, to avoid false-positive errors, but you can lower the confidence parameter to make it less cautious.

If you have:

only kingdom-level classification, it is usually a result of human error (some users have reported this and usually they used the wrong database). That does not sound like your problem.
a mixture of shallow (kingdom) and deep (species) level classification, it is usually an issue with either contaminant DNA present in the samples, or with some sequences being unusually short. See here for ways to diagnose this.
mostly order-level classification, it is probably either characteristic of:
a. the marker gene/primers
b. the database
c. the length of your sequences
d. all of the above.

So, please give us more details. What primers, database, and length of sequences are you using?

How are you currently training your classifier? That could be the issue here. Yes, you should train your classifier following the tutorial (but with the appropriate database) or use one of our pre-trained classifiers for 16S rRNA gene data.

Without the right conditions, taxonomy results can often be disappointing . We all want species, but sometimes our data can only yield so much information. Again, 16S rRNA data with 250 bp reads can usually get genus level reliably and some species, but other marker genes and shorter read lengths can often be much less satisfying. Go through the steps above and let's see if it's possible to improve this in your data.