Why all taxonomic levels on BarPlot have 'D' letter?

SoilRotifer · October 2, 2019, 6:14pm

I can likely provide some insight here as I am one of the contributors that helped to format SILVA database for QIIME. The D_X__ convention was chosen to be as much of a unique and "safe" text string as possible, considering many of the bizarre taxonomy text annotations within the SILVA reference database. That is, it was meant as a quick fix to be able to search and parse these taxonomy strings.

The 'D' was a way of annotating the "taxonomic Depth". At the time some of the code was written, there was a realization that the taxonomy provided for eukaryotes, neither had a consistent fixed depth of ranks, nor a rank consistently associated with a given depth. That is, some taxa have 13 taxonomic ranks, others 7, etc. So, all of the taxonomy strings were padded out to ~14-15 ranks, such that it'd be easier to coerce these strings into tools like RDP classifier, or scikit-learn. That is we had to initially satisfy the requirement that all taxonomy ranks were of equal length.

An additional example... for instance, level D_4__ for one eukaryote may refer to a "Family" whereas that same level may refer to a "Super Family", etc... Thus, we avoided using the standard rank annotation style of Greengenes. Hopefully, this makes some sort of sense.

However, if you are using the SILVA 7-rank taxonomy files, and are only concerned with Archaea & Bacteria, then you can relabel the D_0__ through D_6__ as Domain / Kingdom through Species without much trouble. Again, the issue had more to due with the wonkiness of the Eukaryote taxonomy.

I have since found a way, I think, to obtain "Greengenes-like" taxonomy strings, and I've uploaded some quite crude prototype code here. Note: this has not been thoroughly tested and vetted yet!

In brief, I realized that SILVA folks maintain a taxonomy tree that can be used to easily map, and extract, only those ranks we'd like to retain, by-passing all of the intermediate ranks such as "Sub-order", etc...

We hope to leverage this approach to re-annotate the SILVA taxonomy strings in the future, we are still discussing and working out an approach for this. But if anyone would like to contribute to updating and/or testing an updated SILVA database (using this potential solution) please let us know.

There is your history lesson for the day.