I am trying to train a UNITE ver7 01.12.2017 classifier, and have two questions regarding this training. I am following a modified version of the protocol posted by Greg Caporaso on GitHub (https://github.com/gregcaporaso/2017.06.23-q2-fungal-tutorial).
I am first wondering why the “feature-classifier extract-reads” command that includes the specific primers used for your samples doesn’t seem to be necessary when using fungal ITS data? It was not included in the tutorial I followed and I still got good results (species-level identification for many features), but I know when using 16S data this step is important. Is there a difference between the two databases that makes this command necessary for bacterial identification but not for fungal identification?
My second question is regarding an error I am currently receiving when training the UNITE classifier using the “fit-classifier-naive-bayes” command. I have run this command previously with success, but I was using Docker before and am now using Virtual Box, so I’m wondering if that is the problem? The command I am using is as follows:
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads unite-ver7-dynamic-seqs-01.12.2017.qza
–i-reference-taxonomy unite-ver7-dynamic-tax-01.12.2017.qza
–o-classifier unite-ver7-dynamic-classifier-01.12.2017.qza
–p-classify–chunk-size 20000
–verbose
I added the --p-classify–chunk-size 20000 \ based on the recommendations from a similar error someone received when training the Silva classifier (and also tried with a chunk size of 10000), but it did not fix the problem. The error is as follows:
/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py”, line 224, in call
results = action(**arguments)
File “”, line 2, in fit_classifier_naive_bayes
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 228, in bound_callable
output_types, provenance)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 363, in callable_executor
output_views = self._callable(**view_args)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_feature_classifier/classifier.py”, line 310, in generic_fitter
pipeline)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_feature_classifier/_skl.py”, line 32, in fit_pipeline
pipeline.fit(X, y)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/sklearn/pipeline.py”, line 250, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_feature_classifier/custom.py”, line 41, in fit
classes=classes)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/sklearn/naive_bayes.py”, line 555, in partial_fit
self._update_feature_log_prob(alpha)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/sklearn/naive_bayes.py”, line 717, in update_feature_log_prob
self.feature_log_prob = (np.log(smoothed_fc) -
MemoryError
Also based on a recommendation from the same Silva thread I ran the command ulimit -a which produced the following:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15632
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15632
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Is there anything else I can try, or do I need to find a new machine to run this command?