We used the GreenGenes classifier to get a taxonomic breakdown for some samples, but we would also like to see an ANI/identities breakdown to get a better idea of the criteria it is using.
Is there any way to export/print this information from Qiime2? Also, is there a way to export the cutoff parameters used for a classifier (ex: the ANI cutoff for genus vs species)?
Hello @StanMan2000! Welcome to Qiime 2!
What plugin command did you use for classification? The sci-kit-learn command doesn't use ANI... but vsearch sort of does. Once we know the specific command, we can explore how it works and what cutoffs it uses.
Hey, thanks for the quick reply!
I used the classify-sklearn command, so it looks like there’s no way of getting the ANI. I would love more background about how it works though! I was trying to find a paper or discussion about it earlier but didn’t have any luck.
So, the official reference is the Scikit-learn paper with a whopping 16,000 citations.
A better ref is probably this:
And just to really answer your question
The plugin provides a default method which is to extract k-mer counts from reference sequences and train the scikit-learn multinomial naive Bayes classifier, and it is this method that we test extensively here. Specifically, the pipeline consists of a sklearn.feature_extraction.text.HashingVectorizer feature extraction step followed by a sklearn.naive_bayes.MultinomialNB classification step. The use of a hashing feature extractor allows the use of significantly longer k-mers than the 8-mers that are used by RDP Classifier, and we tested up to 32-mers.
So it's kmer counts fed into a naive Bayes classifier. This is a lot like the RDP Wang classifier, cited 8k times: Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy
If you want an alignment based method, check out the vsearch!
By which @colinbrislawn means the
classify-consensus-vsearch (and also
classify-consensus-blast) methods in
q2-feature-classifier. These perform vsearch or blast-based searching against a reference database, then least common ancestor taxonomic classification. Both have percent identity parameters, which allow you to specify how similar your query must be to the reference (e.g., 97% similarity) to be accepted as a hit. So it does not answer your question squarely — you cannot report the actual ANI — but you can set a threshold for how close your hits need to be.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.