Silva 132 classifiers


(Greg Caporaso) #1

@William is in the process of preparing the Silva 132 QIIME-compatible release (thanks @William!), and we have prepared an initial version of the classifiers that can be used for testing purposes. It would be very helpful to get feedback from users of these classifiers to let us know if you’re experiencing any issues or if they’re working well for you (we’re interested in feedback either way). You can reply to this post with your feedback.

IMPORTANT: The Silva 132 QIIME-compatible release has not gone live yet, so you should use these classifiers only for testing or preliminary analyses. You’ll want to update your results to use classifiers trained on the official release when they are ready. We hope to have those classifiers available within the next couple of weeks.

UPDATE: The Silva 132 QIIME-compatible release is now live. There were no changes to the 99% OTUs since these classifiers were trained, so the classifiers posted here are based on the versions that are live on the Silva website. As the updated Silva files were just released, you should still be on the lookout for any issues (unexpected taxa, etc). As always, if something doesn’t seem right, look into it further (for example, by comparing classification results with BLAST results and/or classification against another reference database such as Greengenes) before assuming it’s right.

Silva 132 99% OTUs (full-length, seven-level taxonomy)
Silva 132 99% OTUs (515-806 region, seven-level taxonomy)

We’ll be updating this post to include some additional classifiers over the next week, so check back if you’re interested.


SILVA 132 for qiime
Using the SILVA classifier directly?
TypeError in importing Silva Taxonomy
Importing and Classifying already quality filtered, de-noised .fastq sequences
How to exclude certain kingdom
#2

Thanks Greg. Were these classifiers trained with scikit-learn 0.19.1 (and should then run this first, as per the warning at https://docs.qiime2.org/2018.2/data-resources/ ?)

conda install --override-channels -c defaults scikit-learn=0.19.1

thanks,
Mircea Podar


(Evan Bolyen) #3

Hi @mpodar!

Looking at the provenance, yes they were trained with scikit-learn 0.19.1. However, assuming you installed QIIME 2 normally, that should be the version you have, so you shouldn’t need to run that command. You can check by running conda list in your environment :slight_smile:

If for some reason, you don’t have the right version already, then your command should do the trick!


(Dave Angelini) #4

Thank you for providing these Silva classifiers! – I am hoping to train a custom classifier to the region of 16S I’m using. However I’ve been having trouble with the taxonomy strings. The resulting file, e.g. taxonomy.qza, only lists the species name rather than the full taxonomy string with all levels. Can you provide the HeaderlessTSVTaxonomyFormat file used to generate this classifier? – Or can it be exported from the classifier qza?
Thanks!


(Evan Bolyen) assigned gregcaporaso #5

(Greg Caporaso) #6

Hi @aphanotus,
Here are the artifacts that I used to train the classifiers:

You can get the raw data by exporting from these if needed, but you should just be able to use these artifacts directly for training.


(Greg Caporaso) unassigned gregcaporaso #7

(Stephanie) #8

Thank you for posting these trained classifiers! Just to be clear - do these include only the SSU/16S/18S SILVA sequences, or are the classifiers trained on the LSU sequences as well?


(Nicholas Bokulich) #9

I believe these are only SSU (both 16S + 18S), no LSU.


(Florentin) #10

Thanks a lot developing Qiime2.

I have a question regarding the taxonomy used to train the qiime2-compatible Silva123 classifiers.

Did you use consensus_taxonomy or majority_taxonomy_7?

As explained by @William in the Silva_132_notes.txt, this might be relevant to use majority or consensus depending on the targetted ecosystem.

Thanks a ton.


(William Walters) #11

Hello Florentin,

It looks like the 7_level_taxonomy.txt was used, based upon the provenance data in the taxonomy artifact posted above.

You could generate one using the consensus or majority (I usually go with based 7-level or majority for broad reports, and then any particular OTUs/SVs of interest are blasted on NCBI to confirm the identity and unique match-the consensus version is very strict).

I hope this helps,
Tony


(Florentin) #12

Thanks Tony. I am training classifiers using both consensus and majority right at the moment.


(M R ) #13

@gregcaporaso Where these classifiers generated using the non-redundant (Ref NR 99) Silva 132 reference database? I would guess so…But could you please confirm? Thanks a lot!


(Matthew Ryan Dillon) assigned gregcaporaso #14

(Greg Caporaso) #15

Hi @M_R, Our Silva 132 classifiers are trained on the rep_set/rep_set_all/99/silva132_99.fna file contained in the Silva_132_release.zip file, available here.

As a reminder for anyone looking for the QIIME 2 Silva classifiers: the latest versions are always available on the QIIME 2 data resources page. There are new versions posted there as of yesterday which are trained with QIIME 2 2019.1.


(Greg Caporaso) unassigned gregcaporaso #16