Error no. 28 - out of memory

1115 · August 25, 2018, 2:44pm

Hi, all.

I'm trying to do classify-sklearn with both pre-trained SILVA classifier and a classifier I trained. But whenever I try to do either of them, Error 28: No space left on device is come up.

This is my command:

qiime feature-classifier classify-sklearn --i-classifier Reference/silva-132-99-nb-classifier.qza --i-reads clustering/close/rep-seqs-cr-97.qza --o-classification taxonomy/taxonomy_close_99.qza --verbose --p-n-jobs -1

I don't understand why since the computer I'm using has 48GB RAM and 2TB HDD volume and I distribute it 1TB to a working directory. I made one more directory and tried:

export TMPDIR=/home/user/tempo

but it has same error. Does anyone help me? Here are clustered table information I wanna do taxonomy analysis

Is it because my samples are too many? 493??

Thank you

timanix · August 25, 2018, 7:11pm

You are using all CPU you have. Try to remove it from your command or indicate 1 instead of -1. It will slow down the process but may help you to avoid running out of memory.

timanix · August 25, 2018, 7:11pm

Oh, I forgot, you can also indicate like the number of all your threads -2, it may also work without slowing you down too much/

1115 · August 26, 2018, 5:44am

Hi, @timanix. Thank you for your reply.

I already tried it but that doesn't work. I tried --p-n-jobs -2. -5, -10 but always it says our of memory. I don't try default yet but I think it takes so much time to get a result...

Thank you!

timanix · August 26, 2018, 6:07am

Oh, so it's not a problem. Sorry, I am a new user here and don't have much experience. You should check out this thread

1115 · August 26, 2018, 6:48am

Oh, sorry, I mistype the error.

Sometimes, it says

errno 12 : cannot allocate memory

And it stops working...

timanix · August 26, 2018, 7:16am

Sorry man I am runing out of ideas. Just try this

mkdir home/user/tempo
export TMPDIR=/home/user/tempo/

1115 · August 26, 2018, 12:55pm

It's alright, buddy. I really appreciate with your reply!

I should wait for other members' reply

thermokarst · August 26, 2018, 1:48pm

Hey there @1115 - you have shared the following errors with us, and all of them are a bit different from each other:

Error no. 28 - out of memory
Error 28: No space left on device is come up
errno 12 : cannot allocate memory

Please copy and paste the exact error message you are observing. If you are unable to do that, please take a screenshot and send that along.

1115 · August 26, 2018, 2:12pm

Hi, @thermokarst

Sorry for confusing you. Here are the error messages. This is the first error I got,

(qiime2) [~/test_amplicon]$ qiime feature-classifier classify-sklearn --i-classifier Reference/silva-132-99-nb-classifier.qza --i-reads clustering/close/rep-seqs-cr-97.qza --o-classification taxonomy/taxonomy_close_99.qza --verbose --p-n-jobs -10
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in classify_sklearn
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
output_types, provenance)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in callable_executor
output_views = self._callable(**view_args)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 215, in classify_sklearn
confidence=confidence)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 45, in predict
for chunk in _chunks(reads, chunk_size)) for m in c)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 789, in call
self.retrieve()
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 699, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/multiprocessing/pool.py", line 644, in get
raise self._value
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 371, in send
CustomizablePickler(buffer, self._reducers).dump(obj)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 240, in call
for dumped_filename in dump(a, filename):
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 484, in dump
NumpyPickler(f, protocol=protocol).dump(value)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/pickle.py", line 408, in dump
self.save(obj)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 278, in save
wrapper.write_array(obj, self)
File "/home/seokwon/miniconda3/envs/qiime2/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 93, in write_array
pickler.file_handle.write(chunk.tostring('C'))
OSError: [Errno 28] No space left on device

Plugin error from feature-classifier:

[Errno 28] No space left on device

See above for debug info.

And below is the next error message after I designate TMPDIR located in main 1TB HDD

Thank you!

Nicholas_Bokulich · August 27, 2018, 1:49pm

Hi @1115,
The SILVA database takes a lot of memory to run (with default parameters) — as much as ~32 GB! You can reduce the memory demand by adjusting the --p-reads-per-batch parameter, though this is already auto-adjusted when running in parallel.

This error, and many solutions, have been discussed a lot on this forum. If you run N parallel jobs, N copies of the database are opened in memory. So 48 GB RAM is not enough if you are classifying with SILVA with parallel jobs.

That error is occurring outside of :qiime2: — it looks like you broke something when you designated a different TMPDIR. If you cannot restore the defaults you should speak to your system admin to get help!

Have you tried? How much is too much time? A few hours may be better than a few days of troubleshooting the memory error.

How many input sequences do you have? Usually runtimes are reasonable (minutes to < 2hr) for average-sized datasets that have been denoised. Large, multi-run datasets, high-diversity samples, and OTU-picked undenoised datasets will take longer (I see it looks like you are using closed-ref OTU picking). See also the link to related posts above — several workarounds to reduce runtime and memory requirements have been discussed.

I hope that helps!

1115 · August 28, 2018, 12:38pm

Hello, @Nicholas_Bokulich! Your reply helps me a lot!! I really appreciate it.

In case of Errno 28, it is solved after setting a new tmpdir. I didn't notice that my command to designate a new tmp directory didn't work so it's alright now, anyway.

Running N parallel jobs you said means that I run qiime2 with more than 2 projects, right? That makes sense since I usually run the analysis with 2 ~ 3 projects by distributing --p-n-jobs 16 for each project(my cores are 32).
And one of my guess for errno 12 is that my samples are quite many, 432. It doesn't make trouble if I run with small number of samples, like 40~50.

So I tried train a classifier for my own, not to use SILVA classifier already trained.

Nope, I didn't try it. That was my guess since it took a whole night for doing denoise with 32cores so I guessed it must be more time consuming if I do that with only one core, default.

I'll let you know whether it works...... I really hope to work tho

Thank you

Nicholas_Bokulich · August 30, 2018, 1:59am

No, it means using the n-jobs parameter. You are setting this to -1, meaning it will run your job in parallel on all available cores.

denoising is a different method, and one that is often even more time-consuming. Feature classifier is a whole different animal.

Give it a try running 1-2 parallel jobs, the wait should not be too long (those this all depends on the number and length of sequences. Using denoised ASVs it should be very quick since there are usually not too many ASVs)

system · September 30, 2018, 7:59am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.