MemoryError when Training Silva Classifier

I am trying to train a SILVA128 classifier on 341F-806R primers for my current data set. Command as follows:

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs_silva.qza --i-reference-taxonomy 99_otu_taxonomy_silva.qza --o-classifier silva128_341_806_classifier --verbose

The command is erroring out with the following:

/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_feature_classifier-2017.2.0-py3.5.egg/q2_feature_classifier/classifier.py:94: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.18.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
Traceback (most recent call last):
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2cli-2017.2.0-py3.5.egg/q2cli/commands.py", line 217, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/action.py", line 171, in callable_wrapper
output_types, provenance)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/action.py", line 248, in callable_executor
output_views = callable(**view_args)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_feature_classifier-2017.2.0-py3.5.egg/q2_feature_classifier/classifier.py", line 191, in generic_fitter
pipeline)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_feature_classifier-2017.2.0-py3.5.egg/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
pipeline.fit(X, y)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/sklearn/pipeline.py", line 270, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_feature_classifier-2017.2.0-py3.5.egg/q2_feature_classifier/custom.py", line 25, in fit
return super().fit(X, y, sample_weight=sample_weight)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 566, in fit
Y = labelbin.fit_transform(y)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/sklearn/base.py", line 494, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/sklearn/preprocessing/label.py", line 335, in transform
sparse_output=self.sparse_output)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/sklearn/preprocessing/label.py", line 520, in label_binarize
Y = Y.toarray()
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/scipy/sparse/compressed.py", line 920, in toarray
return self.tocoo(copy=False).toarray(order=order, out=out)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/scipy/sparse/coo.py", line 252, in toarray
B = self._process_toarray_args(order, out)
File "/home/fgplab/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/scipy/sparse/base.py", line 1009, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

The workstation I am using has 125GB of ram available, and watching the system activity as the training runs, it fails approaching 8gb of ram usage. I'm not entirely sure how a memory error is being thrown here.

Any help is appreciated!

Hi @Droush,
Sorry for the trouble you’re having with this. Could you try re-running your command including the parameter --p-classify--chunk-size 20000, and follow up to let me know if that works? That should reduce the memory that is required for training.

1 Like

It’s strange that it fails at 8gb when there should be 125gb available. @Droush what does the command ulimit -a produce? It’s possible your user account has a memory limit which doesn’t reflect total available RAM.

@gregcaporaso The chuck size modification seems to be working, it at least is now approaching 20gb while its running and has not failed yet.

@ebolyen

Output for ulimit -a doesn’t appear to have a memory limit.

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514987
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 514987
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I’m getting a memory error while training Silva v128, 16S only, 99% OTUs with the 2016 EMP primers.

qiime feature-classifier fit-classifier-naive-bayes --verbose --p-classify--chunk-size 20000   --i-reference-reads SILVA_128_16S_99_515-806_seqs.qza   --i-reference-taxonomy SILVA_128_16S_99_consensus_taxonomy_7levels.qza   --o-classifier SILVA_128_16S_99_515-806_classifier.qza
/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:96: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.18.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
  warnings.warn(warning, UserWarning)
Traceback (most recent call last):
  File "/miniconda3/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-235>", line 2, in fit_classifier_naive_bayes
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 171, in callable_wrapper
    output_types, provenance)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 248, in _callable_executor_
    output_views = callable(**view_args)
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 224, in generic_fitter
    pipeline)
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
    pipeline.fit(X, y)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 270, in fit
    self._final_estimator.fit(Xt, y, **fit_params)
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/custom.py", line 37, in fit
    classes=classes)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 523, in partial_fit
    Y = Y.astype(np.float64)
MemoryError

Do I need to adjust chunk size for database this size?

Hi @ChristianEdwardson! Yes, you’ll likely need to adjust the chunk size and/or run the command on a machine with more memory.