I'm running feature classifier on dada2 output from a full MiSeq run and getting the following error:
qiime feature-classifier classify-sklearn \
> --verbose \
> --i-classifier ~/silva-119-99-515-806-nb-classifier.qza \
> --i-reads rep-seqs-dada2.qza \
> --p-n-jobs 4 \
> --p-chunk-size 20000 \
> --o-classification taxonomy-silva119-dada2.qza
Traceback (most recent call last):
File "/miniconda3/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
results = action(**arguments)
File "<decorator-gen-233>", line 2, in classify_sklearn
File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 171, in callable_wrapper
output_types, provenance)
File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 248, in _callable_executor_
output_views = callable(**view_args)
File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 144, in classify_sklearn
reads, classifier, read_orientation=read_orientation)
File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 128, in _autodetect_orientation
result = list(zip(*predict(first_n_reads, classifier, confidence=0.)))
File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 44, in predict
for chunk in _chunks(reads, chunk_size)) for m in c)
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/miniconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 51, in _predict_chunk
return _predict_chunk_with_conf(pipeline, separator, confidence, chunk)
File "/miniconda3/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 65, in _predict_chunk_with_conf
prob_pos = pipeline.predict_proba(X)
File "/miniconda3/lib/python3.5/site-packages/sklearn/utils/metaestimators.py", line 54, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/miniconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 377, in predict_proba
return self.steps[-1][-1].predict_proba(Xt)
File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 103, in predict_proba
return np.exp(self.predict_log_proba(X))
File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 83, in predict_log_proba
jll = self._joint_log_likelihood(X)
File "/miniconda3/lib/python3.5/site-packages/sklearn/naive_bayes.py", line 707, in _joint_log_likelihood
return (safe_sparse_dot(X, self.feature_log_prob_.T) +
File "/miniconda3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 184, in safe_sparse_dot
ret = a * b
File "/miniconda3/lib/python3.5/site-packages/scipy/sparse/base.py", line 360, in __mul__
return self._mul_multivector(other)
File "/miniconda3/lib/python3.5/site-packages/scipy/sparse/compressed.py", line 511, in _mul_multivector
fn(M, N, n_vecs, self.indptr, self.indices, self.data, other.ravel(), result.ravel())
MemoryError
I tried to train my own EMP primer trimmed Silva 128 database, but got a memory error similar to this:
But reading further I probably did not give it enough memory (my VBox only has 12 GB ram currently). Assuming this might be the case here as well? I also tried with default chunk size setting with error as well, but based on the recommendation for the classifier trainer in the linked forum post I changed it - maybe this isn't the best setting for this?
Thanks,
Christian