Classifer based on GTDB having issues with scikit-learn versions

I used similar code to the code posted here for get-gtdb-data and then getting a feature calssifer:

I am getting an error that I don't understand. I ran the code before updating my Mac OS last week to Sonoma 14.4.1, and it worked fine, but now it no longer works to run the classifer. It seems to have something to do with differing versions of scikit-learn, but it's not clear why the output of get-gtdb-data would clash with fit-classifier-naive-bayes.

If I use the classifer I generated without issue last week from gtdb (same code run), I cannot run the next step, classify-sklearn (code below at the bottom). For this code, the input files were generated a long time ago, and there might be differences scikit-learn - though, again, this code worked fine with two datasets last week that were generated around that same time.

Any ideas?

I am running QIIME 2024.2 amplicon in a conda environment.

qiime rescript get-gtdb-data
--p-version '214.1'
--o-gtdb-taxonomy gtdb-214_1-taxonomy.qza
--o-gtdb-sequences gtdb-214_1-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads gtdb-214_1-seqs.qza
--i-reference-taxonomy gtdb-214_1-taxonomy.qza
--o-classifier gtdb-214_1_classifier.qza
--verbose

/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py:104: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 1.3.2. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in callable_executor
output_views = self._callable(**view_args)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/classifier.py", line 339, in generic_fitter
pipeline = fit_pipeline(reference_reads, reference_taxonomy,
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 76, in fit_pipeline
pipeline.fit(X, y)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/base.py", line 1152, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/pipeline.py", line 423, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/pipeline.py", line 377, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/joblib/memory.py", line 353, in call
return self.func(*args, **kwargs)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/pipeline.py", line 957, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 907, in fit_transform
return self.fit(X, y).transform(X)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/base.py", line 1145, in wrapper
estimator._validate_params()
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/base.py", line 638, in _validate_params
validate_parameter_constraints(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 96, in validate_parameter_constraints
raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got [7, 7] instead.

Plugin error from feature-classifier:

The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got [7, 7] instead.

See above for debug info

If I use the classifer I made before, it doesn't work now either.

qiime feature-classifier classify-sklearn
--i-classifier gtdb-220_classifier.qza
--i-reads DRIFT2-K01-t225-rep-seqs-0523.qza
--o-classification DRIFT2-K01-t225-gtdb-220_taxonomy.qza \
--verbose

Traceback (most recent call last):
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in classify_sklearn
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 339, in bound_callable
self.signature.transform_and_add_callable_args_to_prov(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 395, in transform_and_add_callable_args_to_prov
self._transform_and_add_input_to_prov(
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 428, in _transform_and_add_input_to_prov
transformed_input = _input._view(spec.view_type, recorder)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 406, in _view
result = transformation(self._archiver.data_dir)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/opt/anaconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 59, in _1
raise ValueError('The scikit-learn version (%s) used to generate this'
ValueError: The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (1.3.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.

Plugin error from feature-classifier:

The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (1.3.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.

See above for debug info.

Hi @m_s ,

Both issues are related to the scikit-learn version that you have installed. It does not appear to be related to GTDB specifically, I think that is just a coincidence.

This is not the same version of scikit-learn that is installed as part of the amplicon distribution (0.24.1). So it seems like scikit-learn was updated in your environment some time in the past few weeks, and this has broken your environment. I suspect that someone pip-installed scikit-learn or another package (or perhaps this was somehow changed when you updated your Mac OS).

I recommend installing a fresh qiime2-amplicon-2024.2 environment and these errors should go away (both for training and using the classifiers).

Good luck!

1 Like