How to interpret SILVA taxonomic classifications

morinajc · February 9, 2023, 4:00pm

Hi All,

My question is how to interpret taxonomic names, specifically the difference between suffixes _unclassified and _ge. For instance, I have these two taxa in my data, Babeliales_ge and Babeliales_unclassified, both are at the genus level. However, I do not know how to think about these two taxa except the highest resolution of classification is Babeliales. What differentiates taxa getting the _ge suffix vs. the _unclassified suffix?

Thank you!

SoilRotifer · February 9, 2023, 5:14pm

HI @morinajc, welcome to :qiime2:!

These suffixes do not look like the result of classification, but the curated labels of the database itself. That is, the researchers that deposited the sequences into a given database (e.g. GenBank), likely did not know the full identity of the microbe they sequenced. So, they labeled the taxa with an 'unclassified', 'unknown', or some other arbitrary label. Then other repositories like SILVA, etc., that import this information may opt to keep these annotations, or curate them differently.

So, in this case, the researchers that generated and deposited the data into GenBank, or wherever, probably knew the microbe was something from within the Babeliales, but no further resolution.

I am unable to find Babeliales_ge or Babeliales_unclassified within the SILVA 138.1 NR99 database. I do see the following, based on RESCRIPt formatted SILVA database using this approach:

d__Bacteria; p__Dependentiae; c__Babeliae; o__Babeliales; f__Babeliales; g__Babeliales; s__uncultured_bacterium
d__Bacteria; p__Dependentiae; c__Babeliae; o__Babeliales; f__Vermiphilaceae; g__Vermiphilaceae; s__uncultured_bacterium,
etc.

Was the database curated prior to use? Can you provide an example of the full taxonomy strings, the version and type of the SILVA database used (i.e. NR99, full), and the version of QIIME 2 used?

morinajc · February 9, 2023, 7:44pm

This explanation is very helpful. I erroneously thought the analysis was performed in QIIME 2, but alas it was performed in mothur and that's why you are unable to find these names. Sorry about that! Thank you very much for taking the time to answer my question though I appreciate it.