Strange taxanomy based on Silva database

Dear all,
I finished sequence classified based on silva 138 database from there https://data.qiime2.org/2020.11/common/silva-138-99-seqs.qza and https://data.qiime2.org/2020.11/common/silva-138-99-tax.qza. Then I followed the code:

#extract sequence according to primers
qiime feature-classifier extract-reads
--i-sequences Silva_138_SSURef_NR99_full_length_sequences_2020_11.qza
--p-f-primer GTGCCAGCMGCCGCGGTAA
--p-r-primer CCGTCAATTCCTTTGAGTTT
--o-reads silva_138_SSURef_NR99_515F_907R_sequences_2020_11.qza
#Train the classifier
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva_138_SSURef_NR99_515F_907R_sequences_2020_11.qza
--i-reference-taxonomy Silva_138_SSURef_NR99_full_length_taxonomy_2020_11.qza
--o-classifier silva_138_SSURef_NR99_515F_907R_2020_11_classifier.qza
qiime feature-classifier classify-sklearn
--i-classifier /share/disk0/user/maj/YuZhang/database/qiime2_2020_11/silva_138_SSURef_NR99_515F_907R_2020_11_classifier.qza
--i-reads rep-seq-dada2-4.qza
--o-classification taxonomy-4.qza \

qiime metadata tabulate
--m-input-file taxonomy-4.qza
--o-visualization taxonomy-4.qzv \

qiime taxa barplot
--i-table table-dada2-4.qza
--i-taxonomy taxonomy-4.qza
--m-metadata-file metadata.txt
--o-visualization taxa-bar-plots-4.qzv

Then I imported data to phyloseq in R, drew a barplot, I found some strange genus name,such as "Burkholderia-Caballeronia-Paraburkholderia" ,"Allorhizobium-Neorhizobium-Pararhizobium-Rhizobium" and "Subgroup_7" .

image

This is my first time to use the Silva database (I used Greengene before, but it is outdate), so I want to know these strange annoation are normai? And if they are unnormal, what should I do?

Hi @YuZhang ,
These taxonomy names come straight from SILVA, QIIME 2 is not modifying the names — so "yes" we can call this "normal". For example, you can check the SILVA browser to find these taxonomic groups:

I recommend checking out the SILVA documentation and publications for more details on their curation and taxonomy — I suspect the unusual names you see are from sequences with uncertain placement.

Good luck!

3 Likes

Just to add to @Nicholas_Bokulich excellent answer, you may also check this link from Silva FAQ. As Nicholas already mentioned, most probably those names are deriving from unplaced bacteria, uncultured, clones and so one. You can check associated sequences by blasting them on NCBI or RDP.

3 Likes

Hi @YuZhang
Yes these are normal bacterial genera, it seems as an environmental sample.

Thanks,but I dont know why Burkholderia,Caballeronia and Paraburkholderia classified in a same Genus" Burkholderia-Caballeronia-Paraburkholderia"? Because Burkholderia,Caballeronia and Paraburkholderia more like three genera. How How can I explain this genus? Because it the most abundant genus in my result.

I hope this article will help you
DOI 10.1099/ijsem.0.002202

What about if you try with Silva 132 version?

1 Like

I will try it with Silva 132 version, thanks.

1 Like

all the best
when you assigning taxonomy with 'classify-sklearn'
try to make the confidence more than the default
e.g. --p-confidence 0.9

Thank you for your advice. I really overlooked it

Why? This will not address @YuZhang 's concern, it will most likely just lead to shallower classification. I would not advise increasing the confidence threshold so high for 16S reads.

1 Like

there are too many associated sequences,I think it is a little difficult to blasting them on NCBI . So is there an easier way? Perhaps should I change the silva to greegenes?

Thanks, but I check it on the SILVA Browser, which you posted. But it didn't solve my problem, these also occurs.

So,what is uncertain placement?

Again, as @Nicholas_Bokulich already mentioned:

That is, the taxonomy Burkholderia-Caballeronia-Paraburkholderia is not due to the classifier, but the way the reference sequences are annotated within the SILVA database. As you've just proven to yourself, through the SILVA website :100: . Currently, QIIME 2 / RESCRIPt does not alter taxonomic annotations.

Uncertain placement :man_shrugging: , better known as incertae sedis , basically means that curators of taxonomy are uncertain as to which group an organism belongs. This can be tricky as taxonomic groupings are continually being updated and revised, as @timanix referenced earlier in this thread. There are a variety of ways to represent uncertain taxonomic placement, and they can vary within and between reference databases.

-Cheers!
-Mike

3 Likes

Hello everybody.

I just add that I also found some strange things (or even true, inexplicable errors) using the last version of Silva taxonomy. Perhaps we should make some kind of warning directly to Silva, my collaborator and I are trying to do so. I invite everybody to do the same, so perhaps they can improve their taxonomy checks.
Better to use Silva 132 at the moment.
Bye

claudia

Thanks for the input @claudia.vannini

Just to clarify, was this with version 138 or 138.1? I belive they caught some errors in 138 and released 138.1 as a patch.

Out of curiosity, what are the errors you observed? Are these also uncertain taxonomic placements?

2 Likes

Hi, I don't know exactly the Silva version, I'll send the link to this discussion to my collaborator, so that he can reply. The error was a taxon name (a family) that really does not exist (not even ever proposed by anyone).

You might be observing a propagated taxonomy? That is, any empty ranks are filled in with ranks from above. Reasons for this are described in the following threads:

-Cheers!
-Mike

1 Like

Hi, thanks. No, this is not the case. It's just apparently a completely invented family name.

Please let us know the name, now you sparked my curiousity :grin:

It might also be that SILVA has also adopted the GTDB taxonomy as of release 138, so some unofficial taxonomic names might appear in that version (the adoption of GTDB taxonomy is described in the SILVA faqs that @timanix linked above).

You might also be interested in this paper, as a discussion of such unofficial names appearing in taxonomic databases: Microbial Taxonomy Run Amok

If you want a version of the database without these names, I recommend checking out SILVA 132, which was pre-adoption of GTDB taxonomy... RESCRIPt can be used to compile a QIIME 2-compatible database for that release as well.

2 Likes

Hi again. The apparently invented family name is "Fokiniaceae". As far as I saw, this name does not appear in the GTDB taxonomy. It's a bacterial group we know well, the family should be Midichloriaceae (which indeed does exist in the GTDB taxonomy). The genus "Fokinia" exists (within Midichloriaceae), but not the family Fokiniaceae. No idea of the explanation.
Yes, as I wrote, I think SILVA 132 could be a better choice for the moment.

2 Likes