How to interpret taxonomy assignment confidence scores?

jwdebelius · March 10, 2017, 6:11pm

I have a related question.

There are confidence scores associated with the Taxonomy Artifact. (i.e. -1). Do these address the quality of the classification? What is the appropriate scale for these?

jairideout · March 10, 2017, 6:17pm

@BenKaehler, could you answer this question?

BenKaehler · March 10, 2017, 8:45pm

Thanks @jwdebelius, the confidence scores are only meaningful if you set the --p-confidence parameter when you call classify. Otherwise, it gets set to -1 to indicate that it is not meaningful.

If you do set --p-confidence to a value between zero and one (inclusive) the classifier will truncate the level of assignment until the desired level of confidence is at least the value that you requested, then report the confidence of the assignment. Our notion of "confidence" is still under development, and may change in future versions. It is unlikely that it should be associated with the statistical notion of "confidence". It is called "confidence" for historical reasons.