SILVA 138 Classifiers

I just wanted to let everyone be aware that I’ve hobbled together a simple pipeline for constructing classifiers based on the SILVA 138 release. I’ve been working on this as time permits, so I apologize in advance for the short-cuts :scissors: and clunkiness :hammer: of my approach, but I figured this would be something useful for the community. At least in the short-term :timer_clock: .

Anyway, the files will be temporarily available here, until I can find a longer-term hosting solution:

Do not be surprised if the suddenly disappear :wilted_flower: . If they do, I hope the pipeline I’ve linked above should be sufficient.

The classifiers, and the reference sequences and taxonomy files used to build them, are available too. Note: I’ve made classifiers with and without the species labels. This not only helps to reduce the size of the classifiers, but also allows for faster classification as there is less rank information. This may be ideal for those that typically do not trust species-level taxonomy. Either-way, use what works best for you.

Please let me know if these are useful. Otherwise happy :qiime2:-ing my friends!

-Mike

15 Likes

Awesome, thanks Mike.

A bit of a philosophical/operational question. Given all the changes in taxonomy, with groups changing place in classification between phyla, classes, orders etc, it is becoming impossible to compare taxonomic analyses performed with different versions of Silva/classifier versions in QIIME2. Do you see a potential solution in the future, selecting what taxonomy flavor/vintage to use at the classification step without selecting different classifier files and re-running all analyses?

Cheers,
Mircea

2 Likes

Hi @mpodar,

You’ve discovered one of the things that keeps me up at night! :scream: I would like to figure a way to provide taxonomies from multiple sources (e.g. GTDB, SILVA, etc…) and be able to present those side-by-side. Like a taxonomy-assignment ensemble approach, similar to what is available through the online version of SILVA. I know there are people linking DOIs to taxonomy, so that if your data is assigned to some record / lineage, and that record / lineage has it’s taxonomy updated, then you just pull that updated information via the DOI.

I do not necessarily think you’d have to rerun all of your analyses, unless you are collapsing your OTUs/ASVs by taxonomy. The patterns in your ASVs should be the same, unless the data has been parsed based on taxonomy.

In a nutshell, I do not have a good answer to your inquiry. But this is something I have been thinking about quite often these days. Perhaps someone much smarter than I will have better insight into this. :slight_smile:

-Best wishes!
-Mike

2 Likes

3 posts were split to a new topic: Invalid value for “–i-classifier”

Hi!

at full length full gene silva, wich is the difference between SSU and the one without USS?

Hi @Francisco,

Nothing. I was just not consistent in my file naming. :man_facepalming:

Hi @SoilRotifer,

Thanks for your work. Could you tell me which of the Silva138 files you used to create these classifiers? Is it the smaller (264 MB) Ref NR 99 or the bigger (2GB) Ref file?

Thanks in advance!

Hi @M_R ,

The NR99 as described in the pipeline link of the original post. :slight_smile:

-Mike

2 Likes

Hi @SoilRotifer
Thank you for your awesome job. it helps a lot .
Just one notice, there’s a mistake (dunno if not on purpose) in the ‘coVNert_rna_to_dna.py’
script name, I assume it should be: ‘coNVert_rna_to_dna.py’

3 Likes

Hahaha! Thank you @T.J.Sanko for that nice, yet embarrassing, catch! :man_facepalming:

I’m glad this is useful! I will fix post haste! :slight_smile:

-Mike

2 Likes

Thank you for your great job. I encountered a problem when I ran the pipeline in qiime2-2019.10.

parse_silva_taxonomy.py: /ur/bin/env: bad interpreter: No such file or directory

I assume “#!/ur/bin/env” should be: “#!/usr/bin/env”.

potatoo

1 Like

Fixed! Thank you @potatoo!

1 Like

I added a V3V4 (341F-805R) set here.

4 Likes

Thank you very much for the updates, Mike Robeson.
I’ve try using this classifiers for my microbiome data. But there are some confusing taxonomic data on species level, such as:

d__Bacteria;p__Gemmatimonadota;c__Gemmatimonadetes;o__Gemmatimonadales;f__Gemmatimonadaceae;g__uncultured;s__uncultured_actinobacterium

d__Bacteria;p__Myxococcota;c__Myxococcia;o__Myxococcales;f__Anaeromyxobacteraceae;g__Anaeromyxobacter;s__uncultured_proteobacterium

d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__uncultured;f__uncultured;g__uncultured;s__uncultured_actinobacterium

,etc, which the species level (uncultured_actinobacterium, uncultured_proteobacterium) were not from the same phylum (bold fonts). Is it form the primary database source or else?

Thanks.

And there’s problem when I use it with the new qiime2-2020.2:

“The scikit-learn version (0.21.2) used to generate this artifact does not match the current version of scikit-learn installed (0.22.1). Please retrain your classifier for your current deployment to prevent data-corruption errors.”

Hi @didietkeren, if you read my pipeline notes as well as this post, you’ll see why I do not trust species-level taxonomy. As far as I know SILVA does not curate the taxonomy beyond the genus level. :microbe: As a result, there will be be potential conflicts with the species labels and upper-level taxonomy. This has been noted by other research groups too, see here. Which is why I made two versions of the reference database, with and without the species label. :construction_worker_man:

4 Likes

I’ve not had time to re-train the classifiers for the latest version of QIIME 2 (2020.2). However you can follow the procedure outlined here to train the classifiers yourself. You can do this by simply making use of the sequence and taxonomy qza files I’ve made available. :clamp:

4 Likes

Thank you SoilRotifer for your precious work!
I’m novice here, so if I wrote a silly question please accept my apologies.
I’m trying to train a SILVA 138 classifier with 520-926r primers.
The problem I have is at point 6 of your pipeline that gives me an error:

" filter_fasta_by_seq_id.py: error: unrecognized arguments: -f SILVA_align_seqs.fasta "

should I change -f to -i as an input file? I saw that -f is not an argument defined in your python script.

image

Thanks!