SILVA 138 Classifiers

SoilRotifer · January 9, 2020, 6:24pm

This pipeline has been vastly improved via the new RESCRIPt plugin. Which you can check out via the link below. We hope the process of constructing your own reference sequence database (e.g. SILVA) will be far less onerous.

Click here to see the original documentation

I just wanted to let everyone be aware that I've hobbled together a simple pipeline for constructing classifiers based on the SILVA 138 release. I've been working on this as time permits, so I apologize in advance for the short-cuts and clunkiness of my approach, but I figured this would be something useful for the community. At least in the short-term .

Anyway, the files will be temporarily available here, until I can find a longer-term hosting solution:

Do not be surprised if the suddenly disappear . If they do, I hope the pipeline I've linked above should be sufficient.

The classifiers, and the reference sequences and taxonomy files used to build them, are available too. Note: I've made classifiers with and without the species labels. This not only helps to reduce the size of the classifiers, but also allows for faster classification as there is less rank information. This may be ideal for those that typically do not trust species-level taxonomy. Either-way, use what works best for you.

Please let me know if these are useful. Otherwise happy :qiime2:-ing my friends!

-Mike

mpodar · January 10, 2020, 7:49pm

Awesome, thanks Mike.

A bit of a philosophical/operational question. Given all the changes in taxonomy, with groups changing place in classification between phyla, classes, orders etc, it is becoming impossible to compare taxonomic analyses performed with different versions of Silva/classifier versions in QIIME2. Do you see a potential solution in the future, selecting what taxonomy flavor/vintage to use at the classification step without selecting different classifier files and re-running all analyses?

Cheers,
Mircea

SoilRotifer · January 10, 2020, 8:28pm

Hi @mpodar,

You've discovered one of the things that keeps me up at night! I would like to figure a way to provide taxonomies from multiple sources (e.g. GTDB, SILVA, etc...) and be able to present those side-by-side. Like a taxonomy-assignment ensemble approach, similar to what is available through the online version of SILVA. I know there are people linking DOIs to taxonomy, so that if your data is assigned to some record / lineage, and that record / lineage has it's taxonomy updated, then you just pull that updated information via the DOI.

I do not necessarily think you'd have to rerun all of your analyses, unless you are collapsing your OTUs/ASVs by taxonomy. The patterns in your ASVs should be the same, unless the data has been parsed based on taxonomy.

In a nutshell, I do not have a good answer to your inquiry. But this is something I have been thinking about quite often these days. Perhaps someone much smarter than I will have better insight into this.

-Best wishes!
-Mike

Nicholas_Bokulich · January 30, 2020, 12:24pm

3 posts were split to a new topic: Invalid value for “–i-classifier”

Francisco · January 30, 2020, 6:47pm

Hi!

at full length full gene silva, wich is the difference between SSU and the one without USS?

SoilRotifer · January 30, 2020, 6:52pm

Hi @Francisco,

Nothing. I was just not consistent in my file naming.

M_R · January 31, 2020, 11:02am

Hi @SoilRotifer,

Thanks for your work. Could you tell me which of the Silva138 files you used to create these classifiers? Is it the smaller (264 MB) Ref NR 99 or the bigger (2GB) Ref file?

Thanks in advance!

SoilRotifer · January 31, 2020, 2:06pm

Hi @M_R ,

The NR99 as described in the pipeline link of the original post.

-Mike

T.J.Sanko · February 13, 2020, 5:29pm

Hi @SoilRotifer
Thank you for your awesome job. it helps a lot .
Just one notice, there's a mistake (dunno if not on purpose) in the 'coVNert_rna_to_dna.py'
script name, I assume it should be: 'coNVert_rna_to_dna.py'

SoilRotifer · February 13, 2020, 6:13pm

Hahaha! Thank you @T.J.Sanko for that nice, yet embarrassing, catch!

I'm glad this is useful! I will fix post haste!

-Mike

potatoo · February 19, 2020, 9:29am

Thank you for your great job. I encountered a problem when I ran the pipeline in qiime2-2019.10.

parse_silva_taxonomy.py: /ur/bin/env: bad interpreter: No such file or directory

I assume "#!/ur/bin/env" should be: "#!/usr/bin/env".

potatoo

SoilRotifer · February 19, 2020, 1:48pm

Fixed! Thank you @potatoo!

SoilRotifer · February 28, 2020, 2:43pm

I added a V3V4 (341F-805R) set here.

EDIT (Feb 2, 2023): Reminder, you can now do this yourself with RESCRIPt. See the link at the top of this thread.

didietkeren · March 4, 2020, 5:35am

Thank you very much for the updates, Mike Robeson.
I've try using this classifiers for my microbiome data. But there are some confusing taxonomic data on species level, such as:

d__Bacteria;p__Gemmatimonadota;c__Gemmatimonadetes;o__Gemmatimonadales;f__Gemmatimonadaceae;g__uncultured;s__uncultured_actinobacterium

d__Bacteria;p__Myxococcota;c__Myxococcia;o__Myxococcales;f__Anaeromyxobacteraceae;g__Anaeromyxobacter;s__uncultured_proteobacterium

d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__uncultured;f__uncultured;g__uncultured;s__uncultured_actinobacterium

,etc, which the species level (uncultured_actinobacterium, uncultured_proteobacterium) were not from the same phylum (bold fonts). Is it form the primary database source or else?

Thanks.

didietkeren · March 4, 2020, 11:49pm

And there's problem when I use it with the new qiime2-2020.2:

"The scikit-learn version (0.21.2) used to generate this artifact does not match the current version of scikit-learn installed (0.22.1). Please retrain your classifier for your current deployment to prevent data-corruption errors."

SoilRotifer · March 5, 2020, 2:36pm

Hi @didietkeren, if you read my pipeline notes as well as this post, you'll see why I do not trust species-level taxonomy. As far as I know SILVA does not curate the taxonomy beyond the genus level. As a result, there will be be potential conflicts with the species labels and upper-level taxonomy. This has been noted by other research groups too, see here. Which is why I made two versions of the reference database, with and without the species label.

SoilRotifer · March 5, 2020, 2:38pm

I've not had time to re-train the classifiers for the latest version of QIIME 2 (2020.2). However you can follow the procedure outlined here to train the classifiers yourself. You can do this by simply making use of the sequence and taxonomy qza files I've made available.

Stefano_S · March 6, 2020, 2:05pm

Thank you SoilRotifer for your precious work!
I'm novice here, so if I wrote a silly question please accept my apologies.
I'm trying to train a SILVA 138 classifier with 520-926r primers.
The problem I have is at point 6 of your pipeline that gives me an error:

" filter_fasta_by_seq_id.py: error: unrecognized arguments: -f SILVA_align_seqs.fasta "

should I change -f to -i as an input file? I saw that -f is not an argument defined in your python script.

Thanks!