How does qiime feature-classifier classified-sklearn work?

I have just come into contact with Qiime2. Recently, when I used the ‘qiime feature-classifier classified-sklearn’ to classify the public data of a study, the annotation results I got were inconsistent with the results in that study, as shown in:

no lactococcus, more streptococcus.(the study noted that lactococcus had a high abundance in most samples, while streptococcus had only 7.51%±11.61%.).

The processing tool used in that study was Qiime1.The reason may be that Q2 classified-sklearn misclassified lactococcus as streptococcus,so I tried to modify the confidence level of classified-sklearn and use the qiime feature-classifier classified-consensus-vsearch, but the effect was not good.

Therefore, I want to know the working principle of ‘qiime feature-classifier classified-sklearn’ or the specific data processing steps (which related to the confidence threshold). Can someone help me?

Thank you.

1 Like

Hello Iris,

Welcome to the forums! :qiime2:

Great question!

Short answer: it's a Naive Bayes kmer classifier that's essentially like the RDP Wang classifier and the Mothur classifier.

Long answer: Feature-classifier explained in detail - #3 by Nicholas_Bokulich
Paper on classifiers in Qiime2 :scroll: Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin | Microbiome | Full Text


:scream_cat:

Or maybe it's a good thing: Qiime 2 should offer better taxonomic resultsion than Qiime 1, so maybe this is a useful finding / correction to the original work :man_shrugging:

Colin

1 Like

Hi, @colinbrislawn

Thanks for your clear reply and I have learned a lot from it.

The article you linked tomentioned :

For 16S rRNA gene sequences, naive Bayes bespoke classifiers with k-mer lengths between 12 and 32 and confidence = 0.5 yield maximal recall scores, but RDP (confidence = 0.5) and naive Bayes (uniform class weights, confidence = 0.5, k-mer length = 11, 12, or 18) also perform well .

After reading the script _skl.py posted on github to train the classifier, I didn't find the code to set k-mer length, and I wonder if you can make a solution.

Your reply has given me great help to learn qiime2. I wish you a good day . :sun_with_face:

Iris

1 Like

Thanks Iris! :smiley_cat:

I think we need to call the experts! :phone:
@Nicholas_Bokulich, can we set our own k-mer lengths?

Colin

Absolutely. This occurs during classifier training, with the --p-feat-ext--ngram-range option. The same kmer length(s) will be used during classification (i.e., the same pre-processing is applied to the query sequences).

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.