Accuracy of taxonomic assignation

I assigned taxonomy to the query sequences using pre-trained NB classifier (greengenes database) and would like to know the accuracy of this classification. Is there any method to verify the accuracy? As far as I know, we need to have a known composition data right?

Besides, if I would like to compare the performance between classifier sklearn and consensus-blast, may I know the verification method?

In this paper from members of the Qiime team, they “evaluated and optimized several commonly used classification methods” using “19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences.”

In this paper, the authors build a new database and implement a classic LCA classification. In the process they describe how they perform cross validation testing, which is basically what you are trying to do here.

Finally, this paper does a very critical reassessment (:face_with_symbols_over_mouth: :face_with_raised_eyebrow:) of classification inside existing databases, and concludes “annotation error rate in these databases is ~17%.” :scream_cat:

Scary if true. I’m glad you are testing your taxonomy accuracy.


Hi @Benedict! If I have understood the request correctly, I think what you’re looking for is the q2-quality-control plugin, please take a look at this tutorial:

Taxonomy is messy, which is why it’s super important that our species names are consistent with other researchers. We want to make sure that we are all talking about the same microbes, and the quality control guide that Matt mentioned is a great way to do that. :microbe:

16S v4 reads are usually about 250 bp long. These short reads are able to reveals Earth’s multiscale microbial diversity, but they are not great for getting really specific about taxonomy.

I think we should foreground the composition of the microbial community and it’s context within the environment :earth_africa: :earth_asia: :earth_americas:, instead of talking about specific microbes.

If you are on the hunt to specific microbes, other methods can give better taxonomic resolution.


