Understanding naive bayes+sklearn classifier, top-hit identity distribution

I am less familiar with k-mer nb classifiers and Basian weighted priors than top hit LCA classifiers, so I'll stick to what I know.

If you have not found it already, check out RESCRIPt and its detailed tutorial. Instead of adjusting Basian priors like q2-clawback, RESCRIPt works to better target/curate the region and taxonomy labels in your database. And it includes benchmarks of how well it works:

This method provides information about the number, entropy, and other characteristics of taxonomic labels at each rank in an input taxonomy. It is much less time-consuming than the classification methods above, and provides valuable information about the amount of “information” in a taxonomy artifact.

With a better database, all methods will work better. I would start there!

3 Likes