Plugin error from feature-classifier:

hi all, after i run the following in the linux server (16GB, 16 cores):
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs.qza
–i-reference-taxonomy ref-taxonomy.qza
–o-classifier classifier.qza

the debug info coming out like:
/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/ UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/”, line 246, in call
results = action(**arguments)
File “”, line 2, in fit_classifier_naive_bayes
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/”, line 228, in bound_callable
output_types, provenance)
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/”, line 363, in callable_executor
output_views = self._callable(**view_args)
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/”, line 310, in generic_fitter
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/”, line 32, in fit_pipeline, y)
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/”, line 250, in fit, y, **fit_params)
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/”, line 41, in fit
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/”, line 555, in partial_fit
File “/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/”, line 718, in _update_feature_log_prob
np.log(smoothed_cc.reshape(-1, 1)))

I would like to know my input data is wrong or memory not enough? Thx a lot.

Hi @Yanfei-Geng,

That MemoryError that you are seeing is a pretty clean indicator that the issue is with the amount of memory, not with the data (which would cause a different error earlier). E.g., see this forum post.

Some reference databases can take lots of memory to train classifier. E.g., SILVA database often takes ~32 GB to train. We do have pre-trained 16S rRNA gene classifiers that can help bypass this step.

Good luck!

hi thank you for your prompt reply. I am training trnL P6 loop (only 144bp) of chloroplast gene. The reference database downloaded from NCBI is trnl.fasta=647.71Mb,after importing trnl.qza=161.6Mb,ref-seqs.qza=676kb, to train classifier this kind of data, how much the memory should be at least?thx again

It is really impossible to give a rule of thumb, since there are many factors involved.

Sounds like your reference database is around the same size as SILVA, so I would aim for 32 GB. You could also reduce the classify--chunk-size parameter setting to reduce memory requirements.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.