I’m processing 18s data using a primer that targets marine eukaryotes. After initial processing I noticed that all my fish were being assigned simply to “teleostei” with a handful of exceptions, similar problems were noted with a few other groups. Initially I thought this was simply from rank propagation, but a deeper dive into the reference sequences, many of these sequences (ie, for Chinook salmon, Onchorhynchus tshawytscha) have well supported taxonomy to the genus level, and clear sequence divergence from other members of the family and class. Digging deeper, it appears that Silva just condenses many of these to the order or family level and propagates this through to genus. The more in depth taxonomy is available from “ncbi” or “embl” files that can be retrieved. Has anyone had luck using the
SILVA_138.2_SSURef_NR99_tax_silva.fasta.gz, tax_slv_ssu_138.2.tre.gz with the
taxmap_ncbi_ssu_ref_nr99_138.2.txt.gz, taxmap_ncbi_ssu_ref_nr99_138.2.txt.gz
files following the RESCRIPt tutorial “Hard Mode”. We would like to get at least to genus level whenever possible with these samples.
Meredith
Hi @MVEverett ,
I have not tried this, nor am I working with 18S data. Just want to comment briefly in case it provides some leads.
First, I had a quick look at taxmap_ncbi_ssu_ref_nr99_138. Although it has Onchorhynchus mykiss and Onchorhynchus kisutch, I am not seeing any Onchorhynchus tshawytscha. So swapping the taxmaps as you propose might technically work, but you will still be missing what I assume is a key taxonomic lineage for you, though maybe that's okay if you just want to get to genus level. (I have not tried swapping these)
You might also check out the action get-ncbi-data to create a custom NCBI 18S classifier instead. Or RESCRIPt has actions for some 18S-specific databases like eukaryome and PR2, these could be worth a shot if you want to compare some different options.
Good luck!
1 Like
Hi @MVEverett, I just wanted to point out that you can pull all of the taxonomy from silva without using "hard mode", as mentioned here. See the qiime rescript get-silva-data --help.
Note: when using the default ranks... the --p-rank-propagation will use all the ranks available prior to subsetting the ranks you want.
The default ranks are mainly set for Bacteria and Archaea, but it is best to use other ranks for various clades of Eukaryotes.
Just modify the --p-ranks flag:
qiime rescript get-silva-data \
--p-ranks domain superkingdom kingdom subkingdom superphylum phylum subphylum infraphylum superclass class subclass infraclass superorder order suborder superfamily family subfamily genus \
--p-rank-propagation \
--p-version 138.2 \
--p-target SSURef_NR99 \
--output-dir silva-138.2-all-ranks.qza \
--verbose
Note: the --p-ranks flag is also available in qiime rescript get-ncbi-data and get-pr2-data.
-Cheers!
Thanks for the quick reply. O. tshwaytscha is in the full SILVA, and has likely been condensed down in the NR99 version as it is identical to O. kisutch (this is a reason we only want to get to genus, not enough species level variation in 18s to distinguish a lot of species, but you can often get to genus).
I’ve used get-ncbi-data in RESCRIPt previously for some other markers and it works great, but with the volume of 18s data in NCBI I had been hoping to be able to use the SILVA curated NR99 to avoid having to go through that curation process. The eukaryome looks like a potentially good middle ground so thank you for that tip! I’ll play with them and see how things work.
2 Likes
Thanks for the quick response, I am familiar with get-silva-data and had previously tried this approach. Unfortunately this doesn’t work for my situation. The reason I’m trying through hard mode is that “get-silva-data” pulls the default silva taxonomy, and silva has collapsed a lot of eukaryotes to just their orders or families even though the underlying deeper taxonomic levels are available from the original sequences, and as far as I can tell that command doesn’t have the option to pull the ncbi or embl taxonomy though they are available from silva.
Ahh, I understand.
Let us know how things work out.
Hey @MVEverett, I some how accidently deleted your other recent post. Sorry! But I get what you are saying about the the differences in the taxonomic schema used between NCBI and SILVA.