Exporting Average Nucleotide Identity Used for Classification

StanMan2000 · May 2, 2019, 6:07pm

We used the GreenGenes classifier to get a taxonomic breakdown for some samples, but we would also like to see an ANI/identities breakdown to get a better idea of the criteria it is using.

Is there any way to export/print this information from Qiime2? Also, is there a way to export the cutoff parameters used for a classifier (ex: the ANI cutoff for genus vs species)?

Thanks!

colinbrislawn · May 2, 2019, 6:41pm

Hello @StanMan2000! Welcome to Qiime 2! :qiime2:

What plugin command did you use for classification? The sci-kit-learn command doesn't use ANI... but vsearch sort of does. Once we know the specific command, we can explore how it works and what cutoffs it uses.

Thanks,
Colin

StanMan2000 · May 2, 2019, 7:38pm

Hey, thanks for the quick reply!

I used the classify-sklearn command, so it looks like there's no way of getting the ANI. I would love more background about how it works though! I was trying to find a paper or discussion about it earlier but didn't have any luck.

colinbrislawn · May 2, 2019, 10:50pm

So, the official reference is the Scikit-learn paper with a whopping 16,000 citations.

A better ref is probably this:

And just to really answer your question

The plugin provides a default method which is to extract k-mer counts from reference sequences and train the scikit-learn multinomial naive Bayes classifier, and it is this method that we test extensively here. Specifically, the pipeline consists of a sklearn.feature_extraction.text.HashingVectorizer feature extraction step followed by a sklearn.naive_bayes.MultinomialNB classification step. The use of a hashing feature extractor allows the use of significantly longer k-mers than the 8-mers that are used by RDP Classifier, and we tested up to 32-mers.

So it's kmer counts fed into a naive Bayes classifier. This is a lot like the RDP Wang classifier, cited 8k times: Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy - PMC

If you want an alignment based method, check out the vsearch!

Colin

Nicholas_Bokulich · May 2, 2019, 11:14pm

By which @colinbrislawn means the classify-consensus-vsearch (and also classify-consensus-blast) methods in q2-feature-classifier. These perform vsearch or blast-based searching against a reference database, then least common ancestor taxonomic classification. Both have percent identity parameters, which allow you to specify how similar your query must be to the reference (e.g., 97% similarity) to be accepted as a hit. So it does not answer your question squarely — you cannot report the actual ANI — but you can set a threshold for how close your hits need to be.

system · June 3, 2019, 5:14am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.