Qiime2 v2018.8 seems to be forcing stratification of data even when the --p-no-stratify flag is used.
Here’s the traceback:
Traceback (most recent call last):
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in regress_samples
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 455, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_sample_classifier/classify.py”, line 140, in regress_samples
stratify, missing_samples=missing_samples)
File “”, line 2, in split_table
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor
output_views = self._callable(**view_args)
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_sample_classifier/classify.py”, line 228, in split_table
stratify=True, missing_samples=missing_samples)
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_sample_classifier/utilities.py”, line 390, in _prepare_training_data
features, targets, column, test_size, strata, random_state)
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_sample_classifier/utilities.py”, line 169, in _split_training_data
_stratification_error()
File “/opt/conda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_sample_classifier/utilities.py”, line 189, in _stratification_error
'You have chosen to predict a metadata column that contains ’
ValueError: You have chosen to predict a metadata column that contains one or more values that match only one sample. For proper stratification of data into training and test sets, each class (value) must contain at least two samples. This is a requirement for classification problems, but stratification can be disabled for regression by setting stratify=False. Alternatively, remove all samples that bear a unique class label for your chosen metadata column. Note that disabling stratification can negatively impact predictive accuracy for small data sets.
##########################
I have tried the same command under v2018.6 and it works fine, it seems to be unique to v2018.8
I have tried this so far with 2 separate datasets from different projects and I got exactly the same results (error with 2018.8, stratification is called even with the flag, works with 2018.6).