classifier training for Silva 128 - Memory Error

SY_Yang · June 19, 2019, 2:18pm

Hi,

I am trying to train Silva 128 16S for v3-v4 (341F, 805R) region, but I kept having the Memory Error. I've tried the suggestion of this topic, to use the --p-classify--chunk-size, I tried 20000, 10000, 5000, 1000 all failed. I am using a server that is Red Hat Enterprise Linux 7.5, and set RAM for 512G but still no success.

The QIIME2 version is 2019.4.

The Silva 128 I am using is rep_set_16S 99 and 16S only 99_consenus_taxonomy_7_levels

Here is my extract command

qiime feature-classifier extract-reads
--i-sequences SILVA_128_16S_99_rep_seq.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--p-min-length 250
--o-reads silva128_16S_99_v3v4_ref_seq.qza

and for feature-classifier fit-classifier-naive-bayes

#!/bin/sh
# -S /bin/sh # -N train-classifier-silva128-99
#$ -l mem_req=128G
module load python/3.7.2
conda activate qiime2-2019.4
qiime feature-classifier fit-classifier-naive-bayes
--p-classify--chunk-size 10000
--verbose
--i-reference-reads /home/qiime_data/reference/silva128_16S_99_v3v4_ref_seq.qza
--i-reference-taxonomy SILVA_128_16S_99_consensus_taxa_7.qza
--o-classifier /home/qiime_data/reference/naivebayes_silva99_v3v4classifier.qza

This error I got is

/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.20.2. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 311, in call
results = action(**arguments)
File "</home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-349>", line 2, in fit_classifier_naive_bayes
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/classifier.py", line 318, in generic_fitter
pipeline)
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/_skl.py", line 32, in fit_pipeline
pipeline.fit(X, y)
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/sklearn/pipeline.py", line 267, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/custom.py", line 41, in fit
classes=classes)
File "/home/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 529, in partial_fit
dtype=np.float64)
MemoryError

Plugin error from feature-classifier:
See above for debug info.

No matter what --p-classify--chunk-size I use, the message stop at the same line.
I haven't tried other Silva version but 128 is the one I want to use.

Any suggestion is appreciated!!
SY

Mehrbod_Estaki · June 19, 2019, 9:36pm

Hi @SY_Yang,
512 Gig RAM should certainly be enough, though in your bash script I'm seeing you only requesting 128 Gig, which should still be fine... I'm guessing there's something happening with you actually accessing that available RAM across the server. While someone with more experience with these helps you out, could you instead use this pre-trained classifier of the V3-V4 region with Silva 132 (newer than 128)?

SY_Yang · June 20, 2019, 7:23am

Dear @Mehrbod_Estaki,

Yes, thank you for pointing out. I was using that setting for Greengene and forgot to change it to 512G, but I did try, running for 512G but not working.

I am thinking the same thing, but I don't know how to check it.

Thank you! I'll give it a try.
SY

Nicholas_Bokulich · June 20, 2019, 11:20am

You should contact your server administrator to:

make sure that you are requesting nodes/memory in the proper way
evaluate how much memory the job is actually using.

How long does it take before the job crashes?

SY_Yang · June 21, 2019, 3:01am

Dear @Nicholas_Bokulich
The job normally crashed after 5 mins, so I did not request the memory right.
After confirming, I should not request in the script but send it with qsub command. It ran well and I got my result!

Thank you!
SY

system · July 22, 2019, 9:01am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.