Greengenes Classifier Info


Hi everyone,

I’ve successfully classified my features with the Greengenes 13_8_99 classifier. Looking at the output from the taxa-bar-plots.qzv, I can see that less than 3 features per sample (out of 9,711 features) were “unassigned.” However, when I look at the taxonomy.qzv file, I can see that almost half of my assignments (4,032 out of 9,711) had a confidence below 0.99.

I assumed that the 99% classifier would not assign a feature if it was not 99% confident, but it looks like I’m wrong. Is there a different cutoff value for confidence of classifications?


(Justine) #2

Hi @slatterm,

The 99% in the classifier describes the database that was used, not the assignment confidence. The greengenes reference database was built on OTU-based clustering, and the 99% refers to 99% identitiy clustering. You could also theoretically build a gg_13_8_97 classifier that used the 97% rep set but that wouldn’t say anything about the confidence of the taxonomic assignment.

If you’re concerned about classification accuracy and you’re working in a well defined environment, you might consider something like q2-clawback as an option to improve your classifications.



That makes sense. Thank you Justine!

1 Like