Train the classifier¶

Something went wrong when I training feature classifier using silva database and this is my code:

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138-99-seqs-338f-806r-Bac.qza
--i-reference-taxonomy silva-138-99-tax-338f-806r-Bac.qza
--o-classifier silva-138-99-338f-806r-classifier-Bac.qza

Plugin error from feature-classifier:

The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got [7, 7] instead.

Debug info has been saved to /tmp/qiime2-q2cli-err-pqzhaq0_.log

I think there may be a problem in the previous process, I will put the content of the possible problem here:
time qiime rescript dereplicate
--i-sequences silva-138-99-seqs-filt.qza
--i-taxa silva-138-99-tax.qza
--p-rank-handles 'domain' 'phylum' 'class' 'order' 'family' 'genus' 'species'
--p-mode 'uniq'
--o-dereplicated-sequences silva-138-99-seqs-derep-uniq.qza
--o-dereplicated-taxa silva-138-99-tax-derep-uniq.qza
Saved FeatureData[Sequence] to: silva-138-99-seqs-derep-uniq.qza
Saved FeatureData[Taxonomy] to: silva-138-99-tax-derep-uniq.qza
In "--p-rank-handles", I don't use 'silva' , because I can't use it. and an error will be reported when using Silva:
--p-rank-handles VALUES... List[Str % Choices('disable')] | List[Str %
Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum',
'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass',
'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder',
'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe',
'genus', 'subgenus', 'species group', 'species subgroup', 'species',
'subspecies', 'forma')]
Specifies the set of rank handles used to backfill
missing ranks in the resulting dereplicated
taxonomy. Use 'disable' to prevent applying
'rank-handles'.
[default: ['domain', 'phylum', 'class', 'order', 'family', 'genus', 'species']]

And
time qiime rescript dereplicate
--i-sequences silva-138-99-seqs-338f-806r.qza
--i-taxa silva-138-99-tax-derep-uniq.qza
--p-rank-handles 'domain' 'phylum' 'class' 'order' 'family' 'genus' 'species'
--p-mode 'uniq'
--o-dereplicated-sequences silva-138-99-seqs-338f-806r-uniq.qza
--o-dereplicated-taxa silva-138-99-tax-338f-806r-derep-uniq.qza
Saved FeatureData[Sequence] to: silva-138-99-seqs-338f-806r-uniq.qza
Saved FeatureData[Taxonomy] to: silva-138-99-tax-338f-806r-derep-uniq.qza

Please teachers and seniors explain it to me
Thanks!

It seems like the problem is that something is wrong here
--p-feat-ext--ngram-range

Plugin error from feature-classifier:

The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got '(7, 7)' instead.

Debug info has been saved to /tmp/qiime2-q2cli-err-nhurc8rj.log

But after I tried this code it still didn’t work:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138-99-seqs-338f-806r-Bac.qza
--i-reference-taxonomy silva-138-99-tax-338f-806r-Bac.qza
--p-feat-ext--ngram-range "(7, 7)"
--o-classifier silva-138-99-338f-806r-classifier-Bac.qza
Plugin error from feature-classifier:

The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got '7,7' instead.

Debug info has been saved to /tmp/qiime2-q2cli-err-xd5sdv0j.log

Hello @fjh035, can you please rerun the failing command with the --verbose flag and post the entire output from that here? Thank you.

3 Likes

Teacher please:
(qiime2-test) fjh@MSI:/mnt/d/qiime_experiment/202309_16S$ qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138-99-seqs-338f-806r-Bac.qza
--i-reference-taxonomy silva-138-99-tax-338f-806r-Bac.qza
--p-feat-ext--ngram-range '7,7'
--o-classifier silva-138-99-338f-806r-classifier-Bac.qza
--verbose

warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/q2cli/commands.py", line 468, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
outputs = self.callable_executor(
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/qiime2/sdk/action.py", line 509, in callable_executor
output_views = self._callable(**view_args)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/q2_feature_classifier/classifier.py", line 330, in generic_fitter
pipeline = fit_pipeline(reference_reads, reference_taxonomy,
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/q2_feature_classifier/_skl.py", line 32, in fit_pipeline
pipeline.fit(X, y)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/base.py", line 1151, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/pipeline.py", line 416, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/pipeline.py", line 370, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/joblib/memory.py", line 353, in call
return self.func(*args, **kwargs)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/pipeline.py", line 950, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 904, in fit_transform
return self.fit(X, y).transform(X)
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/base.py", line 1144, in wrapper
estimator._validate_params()
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/base.py", line 637, in _validate_params
validate_parameter_constraints(
File "/home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got '7,7' instead.

Plugin error from feature-classifier:

The 'ngram_range' parameter of HashingVectorizer must be an instance of 'tuple'. Got '7,7' instead.

See above for debug info.

@fjh035 that does look like the parameter is causing the issue. It looks like we give it a list by default but the underlying tool has decided it really wants a tuple; however, I am unable to replicate this on my end, and I'm also not sure how that would slip through our tests. That was a good attempt at making it work with your custom value.

It looks like you're using QIIME 2 2023.9 correct? If so, what distribution and what OS are you using? Can you also run conda list | grep scikit-learn with your environment active and post the output here? Thank you.

2 Likes

I apologize for not being able to respond promptly due to the time difference.
The QIIME version I'm currently using is QIIME 2 2023.05, the distribution is Ubuntu 22.04.3 LTS, and the operating system is WSL2. Next is my code.

(qiime2-test2) fjh@MSI:~$ conda list | grep scikit-learn
scikit-learn 0.24.1 py38h658cfdd_0 conda-forge

@fjh035 Can you run pip freeze | grep scikit-learn?

I installed QIIME 2 2023.5 and still cannot replicate the error. Furthermore even though I have the exact same scikit-learn build as you I don't even have the file /home/fjh/miniconda3/envs/qiime2-test/lib/python3.8/site-packages/sklearn/utils/_param_validation.py that this error is being raised in. The best theory right now is that scikit-learn is also installed through pip and that version is being used.

Thank you very much for the teacher's guidance. listening to your suggestion, teacher, I asked my friend to help download the qiime 2023.9 version, and then successfully ran on the new version.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.