MemoryError when running feature classifer with pre-trained classifier

ChristianEdwardson · May 5, 2017, 5:53pm

I'm running feature classifier on dada2 output from a full MiSeq run and getting the following error:

qiime feature-classifier classify-sklearn \
>   --verbose \
>   --i-classifier ~/silva-119-99-515-806-nb-classifier.qza \
>   --i-reads rep-seqs-dada2.qza \
>   --p-n-jobs 4 \
>   --p-chunk-size 20000 \
>   --o-classification taxonomy-silva119-dada2.qza
Traceback (most recent call last):
  File "/miniconda3/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-233>", line 2, in classify_sklearn
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 171, in callable_wrapper
    output_types, provenance)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 248, in _callable_executor_
    output_views = callable(**view_args)
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 144, in classify_sklearn
    reads, classifier, read_orientation=read_orientation)
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 128, in _autodetect_orientation
    result = list(zip(*predict(first_n_reads, classifier, confidence=0.)))
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 44, in predict
    for chunk in _chunks(reads, chunk_size)) for m in c)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 51, in _predict_chunk
    return _predict_chunk_with_conf(pipeline, separator, confidence, chunk)
  File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 65, in _predict_chunk_with_conf
    prob_pos = pipeline.predict_proba(X)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/utils/metaestimators.py", line 54, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 377, in predict_proba
    return self.steps[-1][-1].predict_proba(Xt)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 103, in predict_proba
    return np.exp(self.predict_log_proba(X))
  File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 83, in predict_log_proba
    jll = self._joint_log_likelihood(X)
  File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 707, in _joint_log_likelihood
    return (safe_sparse_dot(X, self.feature_log_prob_.T) +
  File "/miniconda3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 184, in safe_sparse_dot
    ret = a * b
  File "/miniconda3/lib/python3.5/site-packages/scipy/sparse/base.py", line 360, in __mul__
    return self._mul_multivector(other)
  File "/miniconda3/lib/python3.5/site-packages/scipy/sparse/compressed.py", line 511, in _mul_multivector
    fn(M, N, n_vecs, self.indptr, self.indices, self.data, other.ravel(), result.ravel())
MemoryError

I tried to train my own EMP primer trimmed Silva 128 database, but got a memory error similar to this:

But reading further I probably did not give it enough memory (my VBox only has 12 GB ram currently). Assuming this might be the case here as well? I also tried with default chunk size setting with error as well, but based on the recommendation for the classifier trainer in the linked forum post I changed it - maybe this isn't the best setting for this?

Thanks,
Christian

jairideout · May 5, 2017, 8:47pm

Hi @ChristianEdwardson! On the post you linked to, it sounds like the user was seeing the process approaching 20GB memory and it hadn't finished yet. Can you try increasing the amount of memory available to your virtual machine? I'd try something >20GB.

@BenKaehler @gregcaporaso do you have any recommendations on chunk size / memory when using the Silva classifiers?

BenKaehler · May 5, 2017, 9:19pm

Hi All, the last time I trained a classifier on a silva data set I used up to around 36GB of memory. To some extent silva data sets will always be hungry for memory because they have a massive number of taxonomic classes, but it may be possible to reduce this requirement by reducing the chunk size. I will do some experimentation with that when I get time, probably by Sunday your time.

BenKaehler · May 11, 2017, 12:14am

Hi @ChristianEdwardson, I just re-read your post and realised that I was talking about training step (fit-classifier-naive-bayes) and you are talking about the classification step (classify-sklearn).

For the classification step, try leaving --p-n-jobs at its default value of 1. There is a trade-off here between memory usage and speed, so if you're running out of memory you have to sacrifice speed to fit it in memory. The amount of memory you use should scale roughly linearly with the value of --p-n-jobs.

The --p-chunk-size parameter has different meanings for the classification and training steps. For the training step, the --p-chunk-size parameter affects how the training data is split up and fed to the classifier to reduce memory consumption. For the classification step --p-chunk-size affects how many reads the classifier sends to each parallel worker in each iteration.

Hope that helps, sorry for the slow realisation.

ChristianEdwardson · May 11, 2017, 9:41pm

This worked with the default settings once I upped my VBox RAM to 28GB (out of 32GB) but I got some host memory errors with this setting (it automatically paused my VBox, but continued to run once I unpaused it).

I'd rather have it work slower than eat up all of my RAM. Maybe a setting similar to RDP classifier in QIIME1 where you can set max RAM usage? It might be helpful to have better guidance on chunk size + number of jobs = RAM usage, so that the average user doesn't have to guess on what to adjust when it crashes with an ambiguous "MemoryError."

Thanks again for your help!
Christian

BenKaehler · May 11, 2017, 11:09pm

Hi Christian, thanks for describing your resolution.

Yes, usage notes could be improved throughout q2-feature-classifier and it's on the to do list.

For this particular case, it's not complicated. Chunk size doesn't significantly affect memory usage in the assignment step, so the guidance is to use as many jobs as you have cores and will fit into memory. Unfortunately it's hard to prescribe an exact number of jobs beforehand because memory usage will depend on the data set and parameters used in the training step.

system · June 12, 2017, 5:16am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.