I am running the q2-sample-classifier for training a random forest classifier with RFE. My input file is actually a metabolomics abundance table and therefore the features are metabolites. When looking at the accuracy results, the overall accuracy seems to be 0.97 with 532 selected features. However, when I look at the RFE scores, the accuracy for the 532 features is 0.98. I was wodering why these two accuracies differ for the same number of features? Any isight would be greatly appreciated! Thank you very much!
I think this is because the model summary which has the RFE scores does not consider the hold-out data, but only the model-fit to the training data. There wouldn't be much point in using the hold-out/test data to guide the feature-selection, since you'd just overfit to your test data.
Given your accuracy with the hold-out/test data is .97 (vs .98 for the training data itself), it seems like classification went extremely well.