Dear All,
I am running into error when using sklearn classifier with silva db (silva-132-99-515-806-nb-classifier.qza). I am using qiime 2018.4 and I have run the following command:
qiime feature-classifier classify-sklearn --i-classifier /home/denir/anaconda3/envs/qiime2-2018.2/Silva/silva-132-99-515-806-nb-classifier.qza --i-reads 3_rep-seqs.qza --o-classification 9_taxonomy.qza
The error was as follows:
Plugin error from feature-classifier:
Debug info has been saved to /tmp/qiime2-q2cli-err-0cjste5b.log
(qiime2-2018.4) [email protected]:~/Projects/MonMic/Round_Updated/qiime2$ less /tmp/qiime2-q2cli-err-0cjste5b.log
Traceback (most recent call last):
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in classify_sklearn
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 366, in callable_executor
output_views = self._callable(**view_args)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_feature_classifier/classifier.py”, line 215, in classify_sklearn
confidence=confidence)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_feature_classifier/_skl.py”, line 45, in predict
for chunk in _chunks(reads, chunk_size)) for m in c)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 779, in call
while self.dispatch_one_batch(iterator):
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 625, in dispatch_one_batch
self._dispatch(tasks)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py”, line 111, in apply_async
result = ImmediateResult(func)
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py”, line 332, in init
self.results = batch()
File “/home/denir/.conda/envs/qiime2-2018.4/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 131, in call
/tmp/qiime2-q2cli-err-0cjste5b.log
I am pretty sure I have pinpointed the problem, it is the memory issue. While running the job I was able to observe how the free portion of RAM got down to 2G and the process stopped.
Coming to my question now. We have just bought a machine with 64G RAM in a good hope of being able to run large datasets as this one was (> 800 samples). Seems that we’ve hit the roof here.
Is there any possibility to run this process more efficiently in order to preserve RAM?
Any comments, hints or suggestions are welcome.
Sincerely,
Deni