SILVA 138 Classifiers

kbitenieks · May 18, 2020, 2:13pm

Hi Mike!
Thank you for sharing your pipeline and these classifiers! I also took full lenght sequences with species labels (ver0.02) and trained them for V4-V5 (515f and 926r) classifier.

#Extract reference reads:
--i-sequences SILVA-138-SSURef-Full-Seqs.qza
--p-f-primer GTGYCAGCMGCCGCGGTAA
--p-r-primer CCGYCAATTYMTTTRAGTTT
--p-trunc-len 0
--p-min-length 100
--p-max-length 450
--o-reads ref-seqs-v138-V4V5.qza

#Train the classifier:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads ref-seqs.qza
--i-reference-taxonomy Silva-v138-full-length-seq-taxonomy.qza
--o-classifier SILVA-138-SSURef-V4V5-classifier.qza

A closer look at taxa barplots (taxa-bar-plots_16S-V4V5-35.qzv (2.7 MB) ) shows that I have many weird named phyla in my samples (10bav-F6,LCP-89, AncK6, WPS-2, DTB120, FCPU426, MBNT15, NKB15, Sva0485, WS1, CK-2C2-2, RCP2-54, TA06, WS2, GN01, PAUC34f, NB1-j, SAR324_clade(Marine_group_B), WS4 and others).

Have I done something wrong or it just needs some additional filtering? I will filter out chloroplast and mitochondria sequences.

Thanks!
Kriss

SoilRotifer · May 18, 2020, 2:55pm

You're welcome @kbitenieks!

The phyla names you see are normal. You can investigate this on the SILVA Taxonomy browser. If you click on Bacteria you'll see all sorts of odd names. Many of these groups (at the phyla-level and lower) are still being defined and are considered to be at the candidate (i.e. Candidatus) status.

The field of bacterial taxonomy is undergoing many changes due to leveraging genomic data to aid in taxonomic identification. Many of these taxa have no culture-type specimen and are only defined by genome sequence (or other) data. This has caused quite a bit of debate in the field of bacterial taxonomy . Anyway, this has resulted in many Candidate Phyla, and other proposed groupings.

-Best wishes!
-Mike

farhad1990 · July 28, 2020, 2:23pm

Hi Mike,

Thanks for uploading this classifier here, I am trying to make my own classifer but it seems very memory-intensive and not possible atm. In the meantime I tired to use your pre-trained classifier for 341F-805R primer set but I have an error which I couldn't solve based on the previous discussions in the forum. I would appreciate any help.
Here is the error:
qiime feature-classifier classify-sklearn --i-classifier classifier-consensus.qza --i-reads repseqs.qza --o-classification taxonomy
Plugin error from feature-classifier:

** The scikit-learn version (0.21.2) used to generate this artifact does not match the current version of scikit-learn installed (0.23.1). Please retrain your classifier for your current deployment to prevent data-corruption errors.**

Debug info has been saved to /tmp/qiime2-q2cli-err-w3yust9p.log

I am using qiime2.2020.6 and it seems that the version of the pluging I am using is higher than what had been used for training this classifier?

Bests,
Farhad

SoilRotifer · July 28, 2020, 2:57pm

Hi @farhad1990,

Sadly, if you would like to run the classifier in 2020.6, then you'll have to re-train them for that version of QIIME 2, as the sklearn version changes with each update. The best option in your case might be to make use of qiime2-2020.2 for the classification step.

See this post for more details:

Finally, check out initial post of this thread, it has been updated to redirect you to a new plugin. This should make it much easier for you to make a SILVA classifier.

-Best wishes
-Mike

16sIceland · July 28, 2020, 7:31pm

When training the classifier, should one use the Moving Pictures data, or their own data? And if their own, should it be a subset of their own data or the entire data set?

Thanks!

SoilRotifer · July 28, 2020, 8:46pm

Hi @16sIceland,

You train the classifiers with standard reference data, typically from a curated database, e.g. SILVA, or of your own making. For example, you can use the input files here to construct your own classifier. For more details on how to do this, take a look at the RESCRIPt tutorial linked at the top of this thread.

Finally, you can simply make use of the pre-made classifiers.

-Mike

farhad1990 · July 29, 2020, 11:37am

Thanks Mike. It is working now