Inconsitent taxonomic classification

Hello
I ran Qiime2 2024.10, using the Silva 138.2 classifier, and I realized that the classifier is identifying sequences of mycobacterium with two different taxonomical hierarchies :

  1. D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Actinomycetales;D_4__Actinomycetaceae;D_5__Actinomyces;D_6__Mycobacterium tuberculosis
    and
  2. D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Corynebacteriales;D_4__Mycobacteriaceae;D_5__Mycobacterium;D_6__unidentified

I do not know what to do
Eugenia

Hi @EugeniaSH,

So, as has been noted in several places, the taxonomic assignment for species in Silva are not reliable and generally shouldn't be used. So, I look at this and say D6 is species, Silva speecies are notoriously unreliable and have been known to identify rice species in places rice species are not.

My suggestions are

  1. Work off of the genus level annotation, or the ASV level sequences. Species is a model that biology doesn't actually care about, and so work with what you've got
  2. If you have to know the specific Mycobacterium, use another technique to verify. qPCR or a targeted amplification might be a good approach

Best,
Justine

2 Likes

Hi @EugeniaSH,

It appears you are using a very old version of SILVA db parsers. The D_x__ schema has not been used in several years. We've moved to a more accurate Greengenes-like taxonomy. I'd recommend looking through the RESCRIPt tutorial on how to make your own SILVA database.

Here is some history on the D_x__ schema.

-Mike

3 Likes

Thanks for the response
I am using Silva version 138.2 which I thought was the lattest, and that Greengenes was not as well curated
I will try the RESCRIPt

Thanjks Eugenia

1 Like

Thank you,
I do agree with you in that species level annotation should not be trusted.
Thanks

I think there is a misunderstanding here. What I was trying to say is that the way SILVA taxonomy is parsed has changed. For example, the old way resulted in taxonomy strings that looked like this:

D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Actinomycetales;D_4__Actinomycetaceae;D_5__Actinomyces;D_6__Mycobacterium tuberculosis

Whereas using RESCRIPt will result in more informative taxonomy strings that look like this (i.e. similar to Greengenes):
d__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__Mycobacterium tuberculosis

Note there is also Greengenes2, and other repositories that you can download and use via RESCRIPt, such as GTDB (can now download version 226.0), and RDP.

1 Like