Get phylogentic ranks from a FeatureTable[Taxonomy]?

Hi,

I wanted to address a bug in q2-micom (https://github.com/micom-dev/q2-micom/issues/1) but stumbled over difficulties predicting the taxonomic ranks in general FeatureTable[Taxonomy] artifacts. Depending on the classifier and the data set this may contain somewhere between 1 and 14 ranks (with the SILVA classifier) and I need to extract the family and genus from those. Most data sets have 6 or 7 (kingdom, phylum, class, order, family, genus[, species]) and it is pretty straightforward in that case. Does the artifact carry information about the interpretation of the identified ranks? For instance something like D_0 = kingdom for the SILVA classifier?

The short answer is no, any special rank information/formatting is left to the reference databases themselves.

FeatureData[Taxonomy] is really just a special class of feature metadata — it contains taxonomic labels but no real intelligence about what ranks are present or any other special information.

Fortunately, databases like GTDB, Greengenes, and the QIIME-formatted SILVA database come with rank information baked in (e.g., the D__* labels in SILVA). So my advice to you is to rely on that info for those databases. @SoilRotifer released a revised SILVA reference database with improved 7-level taxonomies for eukaryotes elsewhere on this forum, so that may help you in your quandary (e.g., use this instead of the crazy uneven ranked full SILVA database) :crazy_face:

I hope that helps!

3 Likes

If I understand SILVA correctly the D_* would only indicate the depth in the tree but not the meaning of the rank per se. So D_6 may be genus for bacteria but subclade for eukaryotes which makes it hard to map this even when having the SILVA tree at hand. The tip with @SoilRotifer’s db however is a great fix here. That would definitely resolve my issue. I will point users to use those for now. For reference the thread would be Silva 138 for Qiime2 .

1 Like