Training new feature classifier

We are training feature classifiers for the 16S V3-V4 regions. I noticed the Greengenes database (85_OTU.fasta) listed in the tutorial is from 2013, right? Is there a more updated database available for the training of feature-classifier? Is the 85-OUT_taxonomy.txt useful for any database or only for the tutorial example? Looking forward to your suggestions!

Hi @Bing!

That's right! The 85% OTUs file used in the feature-classifier tutorial is from the Greengenes 13_8 release (August 2013), which is the latest available Greengenes release. The pre-trained classifiers on the data resources page are also trained with Greengenes 13_8.

@wasade do you have any updates on when a new version of Greengenes will be released?

You might try training with the SILVA database or using one of the pre-trained SILVA classifiers we distribute. SILVA has pretty regular updates. Here's the link to SILVA databases that can be imported into QIIME 2 and trained, and here's the link to pre-trained classifiers available with QIIME 2.

I'm not sure what you mean by "useful for any database"; the file is part of the Greengenes 13_8 reference database. We're using the 85% OTUs file in the feature-classifier tutorial so that the example commands run quickly. The tutorial notes this:

Two elements are required for training the classifier: the reference sequences and the corresponding taxonomic classifications. To reduce computation time for this tutorial we will use the relatively small Greengenes 13_8 85% OTU data set.

For your real-world analyses, you probably want to train your classifier using the 99% OTUs in order to have a larger reference database. Those files are available in the Greengenes 13_8 database, and we provide pre-trained classifiers for the Greengenes 99% OTUs. You can obtain these data from the data resources page.

Note: the Moving Pictures tutorial uses the pre-trained Greengenes 13_8 99% OTUs classifier.

Hope this helps!

2 Likes

I don't, sorry. It's a work in progress and do not have a specific ETA.

Thanks for answering my questions. Where can we find this second piece such as “85_otu_taxonomy.txt” used in the training example? Maybe we have to use a “99_otu_taxonomy.txt”, how can we get this .txt file?

Thanks,
Bing

Those taxonomy files (including 85_otu_taxonomy.txt, 99_otu_taxonomy.txt, etc.) are included in the Greengenes reference database that’s linked on the data resources page.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.