Then I imported data to phyloseq in R, drew a barplot, I found some strange genus name,such as "Burkholderia-Caballeronia-Paraburkholderia" ,"Allorhizobium-Neorhizobium-Pararhizobium-Rhizobium" and "Subgroup_7" .
This is my first time to use the Silva database (I used Greengene before, but it is outdate), so I want to know these strange annoation are normai? And if they are unnormal, what should I do?
Hi @YuZhang ,
These taxonomy names come straight from SILVA, QIIME 2 is not modifying the names — so "yes" we can call this "normal". For example, you can check the SILVA browser to find these taxonomic groups:
I recommend checking out the SILVA documentation and publications for more details on their curation and taxonomy — I suspect the unusual names you see are from sequences with uncertain placement.
Just to add to @Nicholas_Bokulich excellent answer, you may also check this link from Silva FAQ. As Nicholas already mentioned, most probably those names are deriving from unplaced bacteria, uncultured, clones and so one. You can check associated sequences by blasting them on NCBI or RDP.
Thanks,but I dont know why Burkholderia,Caballeronia and Paraburkholderia classified in a same Genus" Burkholderia-Caballeronia-Paraburkholderia"? Because Burkholderia,Caballeronia and Paraburkholderia more like three genera. How How can I explain this genus? Because it the most abundant genus in my result.
Why? This will not address @YuZhang 's concern, it will most likely just lead to shallower classification. I would not advise increasing the confidence threshold so high for 16S reads.
there are too many associated sequences,I think it is a little difficult to blasting them on NCBI . So is there an easier way? Perhaps should I change the silva to greegenes?
That is, the taxonomy Burkholderia-Caballeronia-Paraburkholderia is not due to the classifier, but the way the reference sequences are annotated within the SILVA database. As you've just proven to yourself, through the SILVA website . Currently, QIIME 2 / RESCRIPt does not alter taxonomic annotations.
Uncertain placement , better known as incertae sedis , basically means that curators of taxonomy are uncertain as to which group an organism belongs. This can be tricky as taxonomic groupings are continually being updated and revised, as @timanix referenced earlier in this thread. There are a variety of ways to represent uncertain taxonomic placement, and they can vary within and between reference databases.
I just add that I also found some strange things (or even true, inexplicable errors) using the last version of Silva taxonomy. Perhaps we should make some kind of warning directly to Silva, my collaborator and I are trying to do so. I invite everybody to do the same, so perhaps they can improve their taxonomy checks.
Better to use Silva 132 at the moment.
Bye
Hi, I don't know exactly the Silva version, I'll send the link to this discussion to my collaborator, so that he can reply. The error was a taxon name (a family) that really does not exist (not even ever proposed by anyone).
You might be observing a propagated taxonomy? That is, any empty ranks are filled in with ranks from above. Reasons for this are described in the following threads:
Please let us know the name, now you sparked my curiousity
It might also be that SILVA has also adopted the GTDB taxonomy as of release 138, so some unofficial taxonomic names might appear in that version (the adoption of GTDB taxonomy is described in the SILVA faqs that @timanix linked above).
You might also be interested in this paper, as a discussion of such unofficial names appearing in taxonomic databases: Microbial Taxonomy Run Amok
If you want a version of the database without these names, I recommend checking out SILVA 132, which was pre-adoption of GTDB taxonomy... RESCRIPt can be used to compile a QIIME 2-compatible database for that release as well.
Hi again. The apparently invented family name is "Fokiniaceae". As far as I saw, this name does not appear in the GTDB taxonomy. It's a bacterial group we know well, the family should be Midichloriaceae (which indeed does exist in the GTDB taxonomy). The genus "Fokinia" exists (within Midichloriaceae), but not the family Fokiniaceae. No idea of the explanation.
Yes, as I wrote, I think SILVA 132 could be a better choice for the moment.