Error using Naive Bayes classifiers trained on Silva 132 99% OTUs full-length sequences

Daniel_Tichy · October 26, 2018, 1:05am

Hi, I have to thank the developers of this software, It has been really helpful, but I have this problem trying to do a Taxonomy classification and taxonomic analyses using the Naive Bayes classifiers trained on Silva 132 99% OTUs full-length sequences.

I tried this command line
qiime feature-classifier classify-sklearn
--i-classifier silva-132-99-nb-classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy2.qza

And this is the error located in the log file:

Traceback (most recent call last):
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in classify_sklearn
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 225, in bound_callable
spec.view_type, recorder)
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/result.py", line 266, in _view
result = transformation(self._archiver.data_dir)
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_feature_classifier/_taxonomic_classifier.py", line 72, in _1
pipeline = joblib.load(os.path.join(dirname, 'sklearn_pipeline.pkl'))
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
obj = unpickler.load()
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/pickle.py", line 1043, in load
dispatchkey[0]
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 341, in load_build
self.stack.append(array_wrapper.read(self))
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 184, in read
array = self.read_array(unpickler)
File "/home/fagolab/anaconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 130, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
MemoryError

I did not have this problem using the Naive Bayes classifiers trained on Greengenes 13_8 99% OTUs from 515F/806R region of sequences.

Did I just run out of memory for this analysis? If that is the case I read in other topics that you can reduce the number of "jobs" or "reads per batch", Do you think that I can do this analysis with a computer with 4 Gb of RAM?
Thank you in advance.

ebolyen · October 26, 2018, 1:18am

Welcome @Daniel_Tichy,

Yep!

By default, this step runs only a single job (you have to ask for more if you want more), but you can tune the reads-per-batch to something very low (like a few thousand, the default is 500k). Keep in mind that 4Gb just isn't very much to work with, so it could take a very long time (if you're able to do with without a MemoryError at all). Silva definitely requires more memory than Greengenes in general, so this is pretty typical to see

Hopefully that helps, good luck!

Daniel_Tichy · October 26, 2018, 2:07am

Thanks Evan, Thats what I tought, time to ask for a better machine to my boss, I have to do like 200 of these analysis hahahaha.

system · November 26, 2018, 8:07am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.