The solution about the MemoryError when training classifier with SILVA 132

Daryl · February 18, 2020, 8:47am

hello~

A few days ago, I used the silver 99%132 database to train my classifier by qiime feature-classifier fit-classifier-naive-bayes . Unfortunately, I had the same problems about MemoryError as everyone else. There are two types of MemoryError. The key error messages are as follows:

1）numpy.core._exceptions.MemoryError: Unable to allocate array with shape (134217728,) and data type float64

2）numpy.core._exceptions.MemoryError: Unable to allocate array with shape (28624, 8192) and data type float64

Many members in the forum think that this is caused by the lack of computer memory. In fact, it is. But using a higher performance computer is not the only way to solve the problem. My = friend and I tried to find a way to solve the problem. I will share with you how I found and solved this problem:

For the first memory error, the solution is very simple.Set 4G RAM for your virtual machine is enough to deal with it. Reducing -- p-classify -- chunk size is also a good choice for low memory operation
The second kind of memory error, just improving RAM is not enough, unless your computer's RAM is very large (if so, I envy that you have use such a high-performance computer). If you look at the -- p-feature-ext -- n-features parameter in qiime feature classifier fit classifier naive Bayes, the default value is 8192. Is this a coincidence? So, I set it to 4196, this time the error message is as follows:

I don't think it's a coincidence, so change it to 1024 again. After about 4 hours of operation, I got the classifier I wanted. So, if your RAM is small, set the value of feature ext -- n-features small enough. It may take several attempts, but good luck to you.

Here are my recommended memory configuration and commands. You can modify them according to your actual situation:For 97% Silvera classifiers, 4G RAM is enough. For 99% Silvera classifiers, you can try to set 6G RAM. commands:

qiime feature-classifier fit-classifier-naive-bayes \

--i-reference-reads 99-ref-seqs.qza \

--p-classify--chunk-size 100 \

--p-feat-ext--n-features 1024 \

--i-reference-taxonomy ref-taxonomy.qza \

--o-classifier 99-gg13_8classifier.qza

Nicholas_Bokulich · February 18, 2020, 3:34pm

Thanks @Daryl! Adjusting chunk-size when fitting a classifier has been recommended a few times on the forum for fixing this, e.g., MemoryError when Training Silva Classifier

I discourage using --p-feature-ext--n-features to solve the memory error, though. This will actually adjust how many features (kmers) are being considered when fitting the classifier, and hence could alter classifier performance. If you have memory constraints on your machine, chunk-size should be all you need.

Daryl · February 24, 2020, 1:57pm

I got it,thanks.If the value post in

is define the numbers of feature to fit the classifier,the low onecould rise some troubles.I think I need to learn more about Kmer to ensure that I can understand this problem correctly.
(PS：Excuse me for replying to you so late like that. )