Hi @ashutosh,
This is indeed strange. There are quite a few Salmonella sp. within the SILVA 138 reference database. However, there was a recent correction and update to this database, SILVA 138.1, which may fix a few issues. You can use the approach outlined here to make your own SILVA 138.1 classifier. You can see how to do this if you click the drop-down menu "The gritty details" under the section "Getting SILVA data: Hard Mode".
Or, simply run rescript get-silva-data
with default settings. Though.. given what I will discuss below, I think the issue is a problem with genus rank annotations.
I observed that there are some taxonomic inconsistencies at the genus rank for Salmonella. For example:
Accession | Taxonomy |
---|---|
AB680788.1.1466 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Salmonella; s__Salmonella_enterica |
AB680791.1.1466 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Salmonella; s__Salmonella_enterica |
AB855732.1.1520 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Salmonella; s__Salmonella_enterica |
... | ... |
ABAK02000001.2157381.2158912 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__Salmonella_enterica |
CZLR01000032.33487.34870 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__Salmonella_enterica |
KX082838.1.1358 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__Salmonella_enterica |
KM244788.1.1511 | d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Enterobacteriaceae; g__Escherichia-Shigella; s__uncultured_Salmonella |
... | ... |
As you can see the s__Salmonella_enterica
taxa have different genus labels, i.e. g__Escherichia-Shigella
, and g__Salmonella
... and some others (not shown). I think what is happening is that the conflict with the genus labels forces the classifier to only report the LCA f__Enterobacteriaceae
. That is, the problem is the inconsistent annotation of the ranks (again see the description under "caveat emptor!" I mentioned above). You may have to manually edit these labels so that they are all consistent, and re-import the modified taxonomy file and retrain the classifier.
Note, other than providing options to 1) use the organism name as the species label, and 2) propagate taxonomy to fill in empty ranks (available in the latest version RESCRIPt on github ), we do not curate the SILVA taxonomy.
Hopefully, we'll be able to add tools to aid in curating these issues in the future. Please keep us posted.