Taxonomic Classification 16S Databases

Hi All,

I am excited to start utilizing the new qiime. My question involves the use of a 16S rRNA database for sequence identification. In the last Qiime, the default (GreenGenes) 16S reference database stopped updating around 2010 or so. I started skimming through the tutorials and it seems that referencing against a extant 16S database has been replace with a blastn search against NCBI nt. Could you explain more how the sequences are taxonomically identified?



1 Like

Hi @Cody_Glickman, Thanks for your interest in QIIME 2!

In QIIME 2, the first plugin that we have for taxonomic assignment of sequences is the scikit-learn-based q2-feature-classifier. This can be used to do Naive Bayes classification of sequences, but can also use other classification algorithms such as random forest or support vector classifiers. The developer of that plugin, Ben Kaehler, is currently working on a benchmark of these methods against commonly used classifier approaches. We also expect that there will be other plugins for this step in the future.

The q2-feature-classifier's classify method takes a classifier as input. This is a model that has been trained for taxonomic classification, and can be generated with one of q2-feature-classifier's fit-classifier* methods. In the tutorials that are currently online, such as the Moving Pictures tutorial, we’re using a Greengenes 13_8-based classifier that has been trained on the 515F/806R region of the 16S. We’re in the process of developing Silva-based classifiers as well, and in the near future we’ll have a resources page available from where users can download these different classifiers. We’re also planning to write a tutorial (I just created an issue for this where you can keep track of progress) illustrating how to use the fit-classifier* methods to train your own classifiers on any gene sequences.

a blastn search against NCBI nt

I think what you’re referring to here is the tabulated sequences visualization. This is intended to be a reference for looking up a feature id to find its associated representative sequence, and the BLAST links are provided for convenience to quickly get more information about a feature/OTU of interest.

Apologies for the somewhat sparse documentation at the moment. We’re working on pulling these pieces together while we’re in our alpha release stage.