Hoping for some clarification on what “baseline accuracy” indicates in the output from the clasify-samples program (specifically the .qzv
file that is created when passing the --o-accuracy-results
argument). I’m trying to follow the q2-sample-classifier preprint:
- the term overall accuracy reflects the “percentage of test samples that were accurately classified”
Just to confirm, this indicates that the subset of samples being classified end up in the right (expected) group, correct?
What I’m confused about is what the baseline accuracy is representing. From the preprint, this term indicates: “classification accuracy if all samples were classified to the most abundant class”
In the context of the paper there is an indication that some datasets (like HMP and EMP) have more classes; am I correct in thinking that class in this context is indicating the groups that samples are associated with (ie. a body site)? If that’s the case, it would be great to have a simple example how this baseline accuracy would be calculated.
Thank you!