Hi!
I’m trying to gain a better understanding of how the classify_samples command from the sample_classifier plugin works. I understand that under the hood, this plugin primarily wraps the functionality of sklearn, and I’m trying to figure out how equivalent code would look in sklearn (perhaps there’s source code I haven’t found yet).
Let’s say I pick the default RandomForest. I assume that training and testing the model is straightforward using rf = RandomForestClassifier followed by rf.fit(X_train, y_train).
What I’m unsure about is how the --p-cv parameter is used. Is it applied in a repetitive manner, i.e., in a for-loop running the specified number of times? Additionally, how are the Model Accuracy and AUC parameters calculated? Are they derived as the mean/median values of the k-fold cross-validation?
Thanks!