RFE scores and accuracy do not match

I am running the q2-sample-classifier for training a random forest classifier with RFE. My input file is actually a metabolomics abundance table and therefore the features are metabolites. When looking at the accuracy results, the overall accuracy seems to be 0.97 with 532 selected features. However, when I look at the RFE scores, the accuracy for the 532 features is 0.98. I was wodering why these two accuracies differ for the same number of features? Any isight would be greatly appreciated! Thank you very much!

importance.tsv (20.0 KB)
predictive_accuracy.tsv (582 Bytes)
rfe_scores.tsv (544 Bytes)

Hey @meghna_swayambhu,

(@Nicholas_Bokulich, please correct me if I'm wrong)

I think this is because the model summary which has the RFE scores does not consider the hold-out data, but only the model-fit to the training data. There wouldn't be much point in using the hold-out/test data to guide the feature-selection, since you'd just overfit to your test data.

Given your accuracy with the hold-out/test data is .97 (vs .98 for the training data itself), it seems like classification went extremely well.

3 Likes

Hello,
Ah yes, you are absolutely right! The RFE is reported on the training data and the accuracy is the test data. Thank you very much.

Meghna

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.