Interpreting confidence scores for taxonomic assignment (classify-sklearn)

Nicholas_Bokulich · June 26, 2020, 5:45pm

See this post for some more details: Confidence values - taxonomic assignment - #2 by Nicholas_Bokulich

There have been no changes since then. We were testing out different ways to calculate confidence but wound up sticking with the original.

we wound up adjusting the default, based on benchmarking results, but not how confidence was calculated.

No, this is totally unrelated to % identity. Confidence values here are the raw probability estimates output by the naive Bayes classifier, i.e., the predicted probability that the predicted taxon is correct and not another taxon. Naive Bayes classifiers are good at classifying but poor at estimating probabilities, so the “confidence” scores should not be taken too seriously.... just a rough estimate of how confident the classifier feels about its own prediction!