Inconsitent taxonomic classification

EugeniaSH · May 14, 2025, 11:56pm

Hello
I ran Qiime2 2024.10, using the Silva 138.2 classifier, and I realized that the classifier is identifying sequences of mycobacterium with two different taxonomical hierarchies :

D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Actinomycetales;D_4__Actinomycetaceae;D_5__Actinomyces;D_6__Mycobacterium tuberculosis
and
D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Corynebacteriales;D_4__Mycobacteriaceae;D_5__Mycobacterium;D_6__unidentified

I do not know what to do
Eugenia

jwdebelius · May 15, 2025, 1:26am

Hi @EugeniaSH,

So, as has been noted in several places, the taxonomic assignment for species in Silva are not reliable and generally shouldn't be used. So, I look at this and say D6 is species, Silva speecies are notoriously unreliable and have been known to identify rice species in places rice species are not.

My suggestions are

Work off of the genus level annotation, or the ASV level sequences. Species is a model that biology doesn't actually care about, and so work with what you've got
If you have to know the specific Mycobacterium, use another technique to verify. qPCR or a targeted amplification might be a good approach

Best,
Justine

SoilRotifer · May 15, 2025, 12:55pm

Hi @EugeniaSH,

It appears you are using a very old version of SILVA db parsers. The D_x__ schema has not been used in several years. We've moved to a more accurate Greengenes-like taxonomy. I'd recommend looking through the RESCRIPt tutorial on how to make your own SILVA database.

Here is some history on the D_x__ schema.

-Mike

EugeniaSH · May 16, 2025, 2:27am

Thanks for the response
I am using Silva version 138.2 which I thought was the lattest, and that Greengenes was not as well curated
I will try the RESCRIPt

Thanjks Eugenia

EugeniaSH · May 16, 2025, 2:27am

Thank you,
I do agree with you in that species level annotation should not be trusted.
Thanks

SoilRotifer · May 16, 2025, 1:38pm

I think there is a misunderstanding here. What I was trying to say is that the way SILVA taxonomy is parsed has changed. For example, the old way resulted in taxonomy strings that looked like this:

D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Actinomycetales;D_4__Actinomycetaceae;D_5__Actinomyces;D_6__Mycobacterium tuberculosis

Whereas using RESCRIPt will result in more informative taxonomy strings that look like this (i.e. similar to Greengenes):
d__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__Mycobacterium tuberculosis

Note there is also Greengenes2, and other repositories that you can download and use via RESCRIPt, such as GTDB (can now download version 226.0), and RDP.

system · June 16, 2025, 7:38pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.