Using the SILVA classifier directly?

mihcir · July 13, 2018, 6:12pm

Hi - I'm trying to use the latest SILVA database to classify my microbiome reads. Previously I've used Greengenes, and trained my own classifier following the QIIME2 "training feature classifiers" tutorial.

For SILVA, I downloaded the classifier directly from here: Silva 132 classifiers (the 515-806 one, which matches the primers I've used).

I then assumed that I could simply use that file to directly generate taxonomy with my rep-seqs.qza file, without any further manipulation of the silva classifier file. However, when I run the code below I get "Plugin error from feature-classifier."

qiime feature-classifier classify-sklearn
--i-classifier silva-132-99-515-806-nb-classifier.qza
--i-reads rep-seqs.qza
--o-classification silva_taxonomy.qza

Can anyone let me know where I'm going wrong? Is it not as straightforward as simply using the classifier file directly as-downloaded?

Thanks!

Nicholas_Bokulich · July 13, 2018, 11:37pm

Hi @mihcir,
What does the error message say? It should report a log file that you can read, which will contain the full error message. Please share that.

My guess is this is a memory error — you will see MemoryError at the bottom of the error message. See here for some tips to solve this issue.

Good luck!

mihcir · July 14, 2018, 3:35am

Hi @Nicholas_Bokulich,

Here's the error log, it's quite long but you're right that it ends with "Memory Error":

Traceback (most recent call last):
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call_
_ results = action(**arguments)_
_ File "", line 2, in classify_sklearn_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 226, in bound_callable_
_ spec.view_type, recorder)_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/result.py", line 266, in view
_ result = transformation(self.archiver.data_dir)
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/core/transform.py", line 70, in transformation_
_ new_view = transformer(view)_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_feature_classifier/taxonomic_classifier.py", line 72, in 1
_ pipeline = joblib.load(os.path.join(dirname, 'sklearn_pipeline.pkl'))
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load_
_ obj = unpickle(fobj, filename, mmap_mode)
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in unpickle
_ obj = unpickler.load()_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/pickle.py", line 1043, in load_
_ dispatchkey[0]_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 341, in load_build_
_ self.stack.append(array_wrapper.read(self))_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 184, in read_
_ array = self.read_array(unpickler)_
_ File "/home/mihcir/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 130, in read_array_
_ array = unpickler.np.empty(count, dtype=self.dtype)_
MemoryError

I checked the link you mentioned but I already have --p-n-jobs at its smallest setting of 1. Is there a way to further reduce the memory requirement, or do you think from the full error log that perhaps I have a different problem?

Thanks!

Nicholas_Bokulich · July 14, 2018, 3:40am

Definitely a memory error.

See recommendations here.

Use greengenes instead of SILVA — that should help a lot. You can also adjust the reads-per-batch parameter (e.g. try 2000) to have a longer but lower memory job.

mihcir · July 20, 2018, 1:53am

Thank you, this worked!

system · August 20, 2018, 7:53am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.