Stratification and sample-classifier Issues

Hello. I am attempting to run machine learning on my dataset and I received an error regarding stratification that informs me that stratification can be disabled by setting stratify=FALSE. I do not quite understand where this would go in the code. I researched a bit and found the use of a flag (–p-no-stratify) but this is not listed as one of the arguments available for sample-classifier. How can I fix this? I have gone through several of the tutorials and I am not seeing a fix for this. Any and all help would be fantastic. Thanks

Welcome to the forum, @anf0012!

Could you please post the full command and the full error message? This will be needed to troubleshoot.

Yes of course! Sorry I failed to do that before.

qiime sample-classifier classify-samples
–i-table gfic-nochlomito-filtered-table.qza
–m-metadata-file metadata_new-all-samples.txt
–m-metadata-column ml_listeria_species
–p-random-state 666
–p-n-jobs 1
–output-dir listeria_species_results/

Hi @anf0012,
To troubleshoot we would need the full error message too, but I think I know the issue and the message. The first part of the message is key here:

You have chosen to predict a metadata column that contains
one or more values that match only one sample. For proper
stratification of data into training and test sets, each
class (value) must contain at least two samples. This is a
requirement for classification problems

As noted in that message, this is a requirement of classification problems. You should carefully review your metadata to make sure you have reasonable classes. If you have singletons, not only can you not classify these via a split-train-test or cross-validation scheme, but even with doubletons you will most likely also have very unbalanced classes leading to very poor performance, overfitting, etc.

Will do. Thank you so much for your help. I appreciate it!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.