classify-sklearn vs classify-consensus-vsearch - classification depth

areaume · June 11, 2025, 6:09pm

Hi all,

I ran both classify-sklearn and classify-consensus-vsearch using the PR2 database for 18S v9 samples. The result is a much lower classification depth for the classifier based method. Both methods assigned a similar number of ASVs to at least domain, but assignments quickly drop. For example, 80% of ASVs assign to the order level with vsearch and only 50% assign with the classifier.

I understand that classify-sklearn should be more accurate, accounting for fewer ASVs assigned to lower taxonomic levels, but is this much of a difference common? I'd appreciate some feedback from those who have looked at more datasets than me.

Thanks!

Edit: Updating this because I did some more reading and discovered that the default min-consensus for vsearch is lower (0.51) than the min-confidence (0.7) for classify-sklearn. I dropped the sklearn confidence to 0.6 and it slightly increased my classification depth. Is modifying this parameter recommended?

Nicholas_Bokulich · June 11, 2025, 8:17pm

Hi @areaume ,
Great questions! The short answer is that the default settings for classify-sklearn are based on optimal performance for 16S and ITS, which were the most commonly used targets circa 2018 when we released that plugin! And the parameter settings may need adjustment with other markers.

For other marker genes (yes, even 18S) it would be wise to adjust the parameters a little bit to see how this impacts performance. Ideally, this could be tested with a mock community or other ground truth to verify accuracy. Without this, you don't know if deeper actually means better, or if vsearch in this case is actually misclassifying (theoretically it's possible).

So at the very least, I would test a few different classifications with different confidence and other parameter settings. You can use RESCRIPt to evalute the classifications... even without a ground truth, you can use the evaluate-taxonomy action to see how classification depth compares (again, with the caveat that deeper is not necessarily better!)

Note, these parameters (and the classifiers themselves) operate in very different ways, so there is not a 1:1 comparison between them. Just FYI! You can read the paper about q2-feature-classifier where these are more fully explained if you are interested in the theory.

I hope that helps! And if you do test these out, please share your results, I would love to see how the performance compares!

areaume · June 13, 2025, 5:01pm

Thanks @Nicholas_Bokulich,

I appreciate you taking the time to explain. My 16S classification results turned out great, so that makes sense! We have some ideas for mock community testing in the works, but for now I'll work on testing some of the classifier parameters. I'll follow up if I find anything interesting!

Best,
Ashley