Hello
I ran Qiime2 2024.10, using the Silva 138.2 classifier, and I realized that the classifier is identifying sequences of mycobacterium with two different taxonomical hierarchies :
D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Actinomycetales;D_4__Actinomycetaceae;D_5__Actinomyces;D_6__Mycobacterium tuberculosis
and
So, as has been noted in several places, the taxonomic assignment for species in Silva are not reliable and generally shouldn't be used. So, I look at this and say D6 is species, Silva speecies are notoriously unreliable and have been known to identify rice species in places rice species are not.
My suggestions are
Work off of the genus level annotation, or the ASV level sequences. Species is a model that biology doesn't actually care about, and so work with what you've got
If you have to know the specific Mycobacterium, use another technique to verify. qPCR or a targeted amplification might be a good approach
It appears you are using a very old version of SILVA db parsers. The D_x__ schema has not been used in several years. We've moved to a more accurate Greengenes-like taxonomy. I'd recommend looking through the RESCRIPt tutorial on how to make your own SILVA database.
Thanks for the response
I am using Silva version 138.2 which I thought was the lattest, and that Greengenes was not as well curated
I will try the RESCRIPt
I think there is a misunderstanding here. What I was trying to say is that the way SILVA taxonomy is parsed has changed. For example, the old way resulted in taxonomy strings that looked like this:
Whereas using RESCRIPt will result in more informative taxonomy strings that look like this (i.e. similar to Greengenes): d__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__Mycobacterium tuberculosis
Note there is also Greengenes2, and other repositories that you can download and use via RESCRIPt, such as GTDB (can now download version 226.0), and RDP.