Silva 132 classifiers

gregcaporaso · April 7, 2018, 3:40pm

@William is in the process of preparing the Silva 132 QIIME-compatible release (thanks @William!), and we have prepared an initial version of the classifiers that can be used for testing purposes. It would be very helpful to get feedback from users of these classifiers to let us know if you're experiencing any issues or if they're working well for you (we're interested in feedback either way). You can reply to this post with your feedback.

IMPORTANT: The Silva 132 QIIME-compatible release has not gone live yet, so you should use these classifiers only for testing or preliminary analyses. You'll want to update your results to use classifiers trained on the official release when they are ready. We hope to have those classifiers available within the next couple of weeks.

UPDATE: The Silva 132 QIIME-compatible release is now live. There were no changes to the 99% OTUs since these classifiers were trained, so the classifiers posted here are based on the versions that are live on the Silva website. As the updated Silva files were just released, you should still be on the lookout for any issues (unexpected taxa, etc). As always, if something doesn't seem right, look into it further (for example, by comparing classification results with BLAST results and/or classification against another reference database such as Greengenes) before assuming it's right.

Silva 132 99% OTUs (full-length, seven-level taxonomy)
Silva 132 99% OTUs (515-806 region, seven-level taxonomy)

We'll be updating this post to include some additional classifiers over the next week, so check back if you're interested.

mpodar · April 9, 2018, 6:59pm

Thanks Greg. Were these classifiers trained with scikit-learn 0.19.1 (and should then run this first, as per the warning at Data resources — QIIME 2 2018.2.0 documentation ?)

conda install --override-channels -c defaults scikit-learn=0.19.1

thanks,
Mircea Podar

ebolyen · April 9, 2018, 7:04pm

Hi @mpodar!

Looking at the provenance, yes they were trained with scikit-learn 0.19.1. However, assuming you installed QIIME 2 normally, that should be the version you have, so you shouldn't need to run that command. You can check by running conda list in your environment

If for some reason, you don't have the right version already, then your command should do the trick!

aphanotus · April 16, 2018, 6:44pm

Thank you for providing these Silva classifiers! -- I am hoping to train a custom classifier to the region of 16S I'm using. However I've been having trouble with the taxonomy strings. The resulting file, e.g. taxonomy.qza, only lists the species name rather than the full taxonomy string with all levels. Can you provide the HeaderlessTSVTaxonomyFormat file used to generate this classifier? -- Or can it be exported from the classifier qza?
Thanks!

gregcaporaso · April 17, 2018, 3:09pm

Hi @aphanotus,
Here are the artifacts that I used to train the classifiers:

FeatureData[Taxonomy]
FeatureData[Sequence] (full-length)
FeatureData[Sequence] (515-806 region)

You can get the raw data by exporting from these if needed, but you should just be able to use these artifacts directly for training.

starlit.isle · August 16, 2018, 7:45pm

Thank you for posting these trained classifiers! Just to be clear - do these include only the SSU/16S/18S SILVA sequences, or are the classifiers trained on the LSU sequences as well?

Nicholas_Bokulich · August 20, 2018, 3:08pm

I believe these are only SSU (both 16S + 18S), no LSU.

fconstancias · January 31, 2019, 1:18pm

Thanks a lot developing Qiime2.

I have a question regarding the taxonomy used to train the qiime2-compatible Silva123 classifiers.

Did you use consensus_taxonomy or majority_taxonomy_7?

As explained by @William in the Silva_132_notes.txt, this might be relevant to use majority or consensus depending on the targetted ecosystem.

Thanks a ton.

William · January 31, 2019, 1:56pm

Hello Florentin,

It looks like the 7_level_taxonomy.txt was used, based upon the provenance data in the taxonomy artifact posted above.

You could generate one using the consensus or majority (I usually go with based 7-level or majority for broad reports, and then any particular OTUs/SVs of interest are blasted on NCBI to confirm the identity and unique match-the consensus version is very strict).

I hope this helps,
Tony

fconstancias · January 31, 2019, 2:17pm

Thanks Tony. I am training classifiers using both consensus and majority right at the moment.

M_R · February 1, 2019, 10:43am

@gregcaporaso Where these classifiers generated using the non-redundant (Ref NR 99) Silva 132 reference database? I would guess so...But could you please confirm? Thanks a lot!

gregcaporaso · February 1, 2019, 6:47pm

Hi @M_R, Our Silva 132 classifiers are trained on the rep_set/rep_set_all/99/silva132_99.fna file contained in the Silva_132_release.zip file, available here.

As a reminder for anyone looking for the QIIME 2 Silva classifiers: the latest versions are always available on the QIIME 2 data resources page. There are new versions posted there as of yesterday which are trained with QIIME 2 2019.1.

Nicholas_Bokulich · January 10, 2020, 3:26pm

A post was split to a new topic: how to compare taxonomy assignments from two different SILVA classifiers

thermokarst · February 18, 2021, 1:36am

A post was split to a new topic: Exporting data question