Silva 138 Classifier is not classifying Salmonella at genus level

Hi @ashutosh,

This is indeed strange. There are quite a few Salmonella sp. within the SILVA 138 reference database. However, there was a recent correction and update to this database, SILVA 138.1, which may fix a few issues. You can use the approach outlined here to make your own SILVA 138.1 classifier. You can see how to do this if you click the drop-down menu "The gritty details" under the section "Getting SILVA data: Hard Mode".

Or, simply run rescript get-silva-data with default settings. Though.. given what I will discuss below, I think the issue is a problem with genus rank annotations.

I observed that there are some taxonomic inconsistencies at the genus rank for Salmonella. For example:

Accession Taxonomy
AB680788.1.1466 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Salmonella; s__Salmonella_enterica
AB680791.1.1466 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Salmonella; s__Salmonella_enterica
AB855732.1.1520 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Salmonella; s__Salmonella_enterica
... ...
ABAK02000001.2157381.2158912 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__Salmonella_enterica
CZLR01000032.33487.34870 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__Salmonella_enterica
KX082838.1.1358 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__Salmonella_enterica
KM244788.1.1511 d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__uncultured_Salmonella
... ...

As you can see the s__Salmonella_enterica taxa have different genus labels, i.e. g__Escherichia-Shigella, and g__Salmonella... and some others (not shown). I think what is happening is that the conflict with the genus labels forces the classifier to only report the LCA f__Enterobacteriaceae. That is, the problem is the inconsistent annotation of the ranks (again see the description under "caveat emptor!" I mentioned above). You may have to manually edit these labels so that they are all consistent, and re-import the modified taxonomy file and retrain the classifier.

Note, other than providing options to 1) use the organism name as the species label, and 2) propagate taxonomy to fill in empty ranks (available in the latest version RESCRIPt on github ), we do not curate the SILVA taxonomy.

Hopefully, we'll be able to add tools to aid in curating these issues in the future. Please keep us posted.

2 Likes