Training new feature classifier

Bing · August 19, 2017, 5:41pm

We are training feature classifiers for the 16S V3-V4 regions. I noticed the Greengenes database (85_OTU.fasta) listed in the tutorial is from 2013, right? Is there a more updated database available for the training of feature-classifier? Is the 85-OUT_taxonomy.txt useful for any database or only for the tutorial example? Looking forward to your suggestions!

jairideout · August 21, 2017, 8:36pm

Hi @Bing!

That's right! The 85% OTUs file used in the feature-classifier tutorial is from the Greengenes 13_8 release (August 2013), which is the latest available Greengenes release. The pre-trained classifiers on the data resources page are also trained with Greengenes 13_8.

@wasade do you have any updates on when a new version of Greengenes will be released?

You might try training with the SILVA database or using one of the pre-trained SILVA classifiers we distribute. SILVA has pretty regular updates. Here's the link to SILVA databases that can be imported into QIIME 2 and trained, and here's the link to pre-trained classifiers available with QIIME 2.

I'm not sure what you mean by "useful for any database"; the file is part of the Greengenes 13_8 reference database. We're using the 85% OTUs file in the feature-classifier tutorial so that the example commands run quickly. The tutorial notes this:

Two elements are required for training the classifier: the reference sequences and the corresponding taxonomic classifications. To reduce computation time for this tutorial we will use the relatively small Greengenes 13_8 85% OTU data set.

For your real-world analyses, you probably want to train your classifier using the 99% OTUs in order to have a larger reference database. Those files are available in the Greengenes 13_8 database, and we provide pre-trained classifiers for the Greengenes 99% OTUs. You can obtain these data from the data resources page.

Note: the Moving Pictures tutorial uses the pre-trained Greengenes 13_8 99% OTUs classifier.

Hope this helps!

wasade · August 22, 2017, 5:43pm

I don't, sorry. It's a work in progress and do not have a specific ETA.

Bing · August 23, 2017, 2:39am

Thanks for answering my questions. Where can we find this second piece such as "85_otu_taxonomy.txt" used in the training example? Maybe we have to use a "99_otu_taxonomy.txt", how can we get this .txt file?

Thanks,
Bing

jairideout · August 23, 2017, 5:34pm

Those taxonomy files (including 85_otu_taxonomy.txt, 99_otu_taxonomy.txt, etc.) are included in the Greengenes reference database that's linked on the data resources page.

system · September 23, 2017, 11:34pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.