feature-classifier memory error :-(

shira · July 15, 2019, 4:56am

I am running the following command on qiime2-2019.4 (Linux terminal):

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier SILVA_classifier99.qza --verbose

The process stops midway and I get the following message, suggesting a memory error. My ulimit parameters are appended at the end. The machine is quite powerful, any idea what's going on, and how to solve it?

/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.20.2. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 311, in call
results = action(**arguments)
File "</home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-349>", line 2, in fit_classifier_naive_bayes
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/classifier.py", line 318, in generic_fitter
pipeline)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/_skl.py", line 32, in fit_pipeline
pipeline.fit(X, y)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/sklearn/pipeline.py", line 267, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_classifier/custom.py", line 41, in fit
classes=classes)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 562, in partial_fit
self._update_feature_log_prob(alpha)
File "/home/userx/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 723, in update_feature_log_prob
self.feature_log_prob = (np.log(smoothed_fc) -
MemoryError

Plugin error from feature-classifier:

See above for debug info.

my ulimit is set at:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127281
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127281
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Mehrbod_Estaki · July 15, 2019, 6:01pm

Hi @shira,
I believe the error, as you mentioned, is memory related. Training a classifier on the SILVA database is notoriously memory-consuming. If I'm not mistaken you have 16Gig memory to access here which is not enough for this task. I would up that to at least 32 Gig if you can, more is better too. You can also look up the forum and see examples of people playing with reducing --p-classify--chunk size as well that may help with memory requirements.

shira · July 15, 2019, 6:07pm

Thanks @Mehrbod_Estaki

I am not sure how to decipher the ulimit output. Where do you see this:

Mehrbod_Estaki · July 15, 2019, 7:05pm

Hi @shira,
Actually in re-checking your ulimit specs I seem to have been looking at the wrong parameter. You do have unlimited memory allocation:

max memory size (kbytes, -m) unlimited

I was looking at the max locked memory above it by accident.

I'm not familiar with with this set up so unfortunately I'm not sure what to recommend. Perhaps one of the devs can, but this seems like more of an issue with your network/cluster so your best bet is to discuss it with your cluster admin and explain your problem and situation (i.e. the need for more memory). There may be some other memory limits we aren't aware of, for example what is 'unlimited' may be different across different systems.
If all that fails for some reason you can always use the greengenes database instead of Silva which requires much much less memory.

shira · July 15, 2019, 7:11pm

Thanks. I will try to figure it out, but it seems a bit strange as I was able to use the same setup to run dada2 on a large set of samples, and it only took a couple of minutes, so It looks like I do have a good amount of memory available to me. I will update here if I find a solution, for the benefit of other Linux users...

Nicholas_Bokulich · July 17, 2019, 2:01pm

dada2 can be memory-intensive, but less so than feature classifier (which is essentially reading the entire sequence database into memory). So large reference databases (e.g., SILVA) will take large amounts of memory.

Otherwise, @Mehrbod_Estaki's advice above will fix this problem, no matter how much memory you actually have available (within reason):

shira · July 17, 2019, 3:24pm

Thanks for the advice, I will try the chunk size option. What puzzles me is that I have run this process before, using the same database. The only difference was that I was using a mac computer that had half the ram and an inferior processor, and it was running qiime2-2018. It took a long time, but completed. Now, using a much stronger computer (a Linux run pc, not a cluster) and using qiime2-2019.4 it fails after a minute or two.

Unfortunately I can't switch to greengenes mid oroject, so I really need to find a solution....I already have the classifier made for qiime2-2018 but I can't use it with qiime2-2019 and I can't seem to get qiime2-2018 installed because it conflicts with the updated version of miniconda3. Is there any way to work around that? Thanks.

Nicholas_Bokulich · July 17, 2019, 3:33pm

Please do. Given your other information, I think this is the best option for getting this to work on your current machine. By the way, this parameter is called --p-reads-per-batch (I think @Mehrbod_Estaki may have been referencing some old documentation when he mentioned chunk size)

Perhaps it is not configured correctly? You should check how much memory the job is actually using, and the maximum amount available. It is a little suspicious that this worked for you on a less powerful machine.

Do you have access to the old machine that already has that release installed? You could also downgrade conda if needed but that is probably not a good idea — it sounds like this is an issue with the hardware, not with the QIIME 2 version or with the classifier itself.

shira · July 18, 2019, 4:22pm

Update - Success!
Thanks @Mehrbod_Estaki and @Nicholas_Bokulich

I had to bring the --p-classify--chunk-size down to 5000, and it finally worked, took a couple of hours.

With larger chunk sizes (20,000 is the default) it would run out of memory and crash. I actually followed the process while it was running and it consumed around 20G of memory.

Nicholas_Bokulich · July 18, 2019, 4:29pm

Thanks for confirming! That runtime sounds much more reasonable (I take the chunk size down to 2000 and it runs on an 8GB laptop)

Indeed, we have seen that the SILVA classifiers will often take around 32GB of memory at max capacity.

Glad you were able to resolve this! Good luck with your downstream analysis!

jordanp · June 15, 2022, 2:54pm

I solved this problem by allocating a larger swap file in my system as well as changing the chunk size

make the swap file - just make sure you have enough hard disk space for this
sudo fallocate -l 10G /swapfile
change permission of swap
sudo chmod 600 /swapfile
tell the system what it is
sudo mkswap /swapfile
and lastly tell the system to use it
sudo swapon /swapfile

and finally make the changer permanent if you're happy with it

sudo cp /etc/fstab /etc/fstab.bak
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab