Hello,
I am trying to train a taxonomic classifier using the Greengenes 13_8 reference data using the following commands, but keep getting a plugin error message:
qiime feature-classifier extract-reads \
--i-sequences 97_otus.qza
--p-f-primer CCTACGGGAGGCAGCAG
--p-r-primer ATTACCGCGGCTGCTGG
--o-reads ref-seqs.qza
(which runs fine)
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads ref-seqs.qza
--i-reference-taxonomy ref-taxonomy.qza
--o-classifier classifier.qza
but then i get this message:
Plugin error from feature-classifier:
Debug info has been saved to /tmp/qiime2-q2cli-err-19qp8cml.log
I ran the command again with --verbose and got this message:
/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:98: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/q2cli/commands.py", line 218, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 220, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 355, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 276, in generic_fitter
pipeline)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 32, in fit_pipeline
pipeline.fit(X, y)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/sklearn/pipeline.py", line 250, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_feature_classifier/custom.py", line 41, in fit
classes=classes)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 555, in partial_fit
self._update_feature_log_prob(alpha)
File "/home/qiime2/miniconda/envs/qiime2-2017.11/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 718, in _update_feature_log_prob
np.log(smoothed_cc.reshape(-1, 1)))
MemoryError
but not sure how to interpret it...
I also had two more questions:
i) what is the need to select a truncation parameter when extracting the reference reads? My sequences are of different lengths as they have been done with Ion PGM; wouldn't truncating the fragments mean that I would leave out bases that could be used for taxonomic classification? Since I wasn't sure what value to choose I left it at default value.
ii) what is the difference between using the Naive Bayes classifier and the scikit classifier? There seems to be a warning message regarding the classifier I chose as well. I only chose the Naive Bayes classifier as this was presented in the tutorials.
Many thanks