How does qiime feature-classifier classified-sklearn work?

Iris · February 23, 2020, 6:05am

I have just come into contact with Qiime2. Recently, when I used the 'qiime feature-classifier classified-sklearn' to classify the public data of a study, the annotation results I got were inconsistent with the results in that study, as shown in:

no lactococcus, more streptococcus.(the study noted that lactococcus had a high abundance in most samples, while streptococcus had only 7.51%±11.61%.).

The processing tool used in that study was Qiime1.The reason may be that Q2 classified-sklearn misclassified lactococcus as streptococcus,so I tried to modify the confidence level of classified-sklearn and use the qiime feature-classifier classified-consensus-vsearch, but the effect was not good.

Therefore, I want to know the working principle of 'qiime feature-classifier classified-sklearn' or the specific data processing steps (which related to the confidence threshold). Can someone help me?

Thank you.

colinbrislawn · February 23, 2020, 11:21pm

Hello Iris,

Welcome to the forums! :qiime2:

Great question!

Short answer: it's a Naive Bayes kmer classifier that's essentially like the RDP Wang classifier and the Mothur classifier.

Long answer: Feature-classifier explained in detail - #3 by Nicholas_Bokulich
Paper on classifiers in Qiime2 Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin | Microbiome | Full Text

Or maybe it's a good thing: Qiime 2 should offer better taxonomic resultsion than Qiime 1, so maybe this is a useful finding / correction to the original work

Colin

Iris · February 24, 2020, 3:34am

Hi, @colinbrislawn

Thanks for your clear reply and I have learned a lot from it.

The article you linked tomentioned :

For 16S rRNA gene sequences, naive Bayes bespoke classifiers with k-mer lengths between 12 and 32 and confidence = 0.5 yield maximal recall scores, but RDP (confidence = 0.5) and naive Bayes (uniform class weights, confidence = 0.5, k-mer length = 11, 12, or 18) also perform well .

After reading the script _skl.py posted on github to train the classifier, I didn't find the code to set k-mer length, and I wonder if you can make a solution.

Your reply has given me great help to learn qiime2. I wish you a good day .

Iris

colinbrislawn · February 24, 2020, 2:01pm

Thanks Iris!

I think we need to call the experts!
@Nicholas_Bokulich, can we set our own k-mer lengths?

Colin

Nicholas_Bokulich · February 24, 2020, 3:32pm

Absolutely. This occurs during classifier training, with the --p-feat-ext--ngram-range option. The same kmer length(s) will be used during classification (i.e., the same pre-processing is applied to the query sequences).

system · March 26, 2020, 9:32pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.