Greengenes Taxonomic Naming Schema: Letters and Numbers

Hi @wasade,
Thank you for your new Greengenes database, it helps us a lot. When I try to analyze the bacterial community under different taxonomy ranks, there is a problem that we have to delete the "_A" or "_number" labels, again and again, to make sure they are the same phylum or ~~~. So may I ask you if there is a method to solve this "problem"?
Best wishes,

1 Like

Hi @Yu_Ren,

Those taxa are not monophyletic, and it would be misleading to remove the suffixes. You can find more detail about the use of the suffixes on the GTDB's FAQ.

GTDB uses _A, _B, etc to denote polyphyly. Within Greengenes2, there are some taxa which are further unsupported as monophyletic even with the GTDB suffixes, and that is where the _number component comes from.

The big reason this is important is to ensure the taxonomy corresponds with the phylogeny. If the suffixes are removed, the taxonomy will no longer reflect the phylogeny.