Why does Greengenes assign more features at the class and order ranks than Silva?

Nicholas_Bokulich · March 4, 2020, 10:25pm

It is important to note that (unless if classifying samples with known composition, e.g., mock communities), more features classified as species does not necessarily mean better, because you don't know if those classifications are correct.

I agree with @Mehrbod_Estaki regarding the relative strengths and weaknesses of working with larger/more diverse/more recently updated databases. This ties into my point about species-level classifications being correct. To work off of @Mehrbod_Estaki's example, imagine the true species is A. glycaniphila — GG would classify this to species (A. muciniphila) because there is no ambiguity in the genus, but SILVA would probably classify to genus level if it cannot distinguish A. muciniphila from A. glycaniphila (e.g., because they have identical seqs for the marker gene fragment that you sequenced). Which would you prefer, a correct genus-level classification or an incorrect species-level classification?

You should not remove those. Those are GG's method of handling taxonomically ambiguous clades, see the original publication for a full description and see this topic for more details: