A few months ago I used RESCRIPt to download the Silva 138.1 SSU Ref database.
qiime rescript get-silva-data \ --p-version '138.1' \ --p-target 'SSURef' \ --p-include-species-labels \ --o-silva-sequences silva-138-1-ssu-seqs.qza \ --o-silva-taxonomy silva-138-1-ssu-tax.qza
It seemingly worked great, however one of my colleagues that is interested in copepods was using the database that I downloaded via RESCRIPt and was dismayed by the number of copepod sequences in the database. This was curious (and a bit concerning), because we know that by looking at the Silva Browser online there are many copepods. After she brought this to my attention, I downloaded the Silva 138.1 SSU Ref taxonomy file directly from the Silva archive for a little comparison.
First, I exported the RESCRIPt-pulled version of Silva.
qiime tools export --input-path silva-138-1-ssu-tax.qza --export-path export
Then I downloaded the equivalent file from Silva ((https://www.arb-silva.de/fileadmin/silva_databases/release_138.1/Exports/taxonomy/taxmap_slv_ssu_ref_138.1.txt.gz)
and now from some comparisons (note: taxonomy.tsv is the qiime2-rescript version of the silva 138.1 taxonomy:
grep -c ^ taxonomy.tsv.
zgrep -c ^ taxmap_slv_ssu_ref_138.1.txt.gz
So, each version of Silva 138.1 has the same number of sequences.
grep -c taxonomy.tsv -e Copepoda
zgrep -c taxmap_slv_ssu_ref_138.1.txt.gz -e Copepoda
Sooo..... despite having the same total number of sequences, for some reason a whole host of copepod sequences are either missing or perhaps their taxonomies aren't parsed correctly and thus falsely appear to be missing.
Has any one else experienced this and come up with a solution?
Thanks a million!!