Hi!
I’m trying to gain a better understanding of how the classify_samples
command from the sample_classifier
plugin works. I understand that under the hood, this plugin primarily wraps the functionality of sklearn
, and I’m trying to figure out how equivalent code would look in sklearn
(perhaps there’s source code I haven’t found yet).
Let’s say I pick the default RandomForest
. I assume that training and testing the model is straightforward using rf = RandomForestClassifier
followed by rf.fit(X_train, y_train)
.
What I’m unsure about is how the --p-cv
parameter is used. Is it applied in a repetitive manner, i.e., in a for-loop running the specified number of times? Additionally, how are the Model Accuracy and AUC parameters calculated? Are they derived as the mean/median values of the k-fold cross-validation?
Thanks!