Memory Error using the pretrained silva 119 classifier

Vixer · February 9, 2018, 2:31pm

Hello again!
I´m testing my reads using the pre-trained classifier "Silva 119 99% OTUs from 515F/806R region of sequences" since my primers are seem to work with it (f: GTGCCAGCMGCCGCGGTAA & r: GGACTACVSGGGTATCTAAT) and after few minutes into the process i get an error saying Memory Error. Then I use the "Greengenes 13_8 99% OTUs from 515F/806R region of sequences" and the process runs just fine.

Is the error caused because of the size of the Silva file and is there a way to fix this error (I´m running qiime2 2017.11 in a virtual machine with 3 gigs RAM assigned to it, the computer has 8g of RAM).

jairideout · February 9, 2018, 5:35pm

Hi @Vixer! Take a look at @BenKaehler's post for details about pre-trained Silva classifier memory requirements, as well as some options you can try to reduce the memory requirements. It appears that the pre-trained Silva classifiers use a maximum of 11GB memory.

My hunch is that you'll need to either allocate more memory to your virtual machine, or run the command on a computer with more memory -- different chunk sizes and/or single-threaded mode will likely not work if you're limited to 3GB RAM. Thanks!

Vixer · February 12, 2018, 3:13pm

Thanks for asnwering. I uptaded to the lastest qiime2 version and allocated more RAM (now 4GB) and looking at the options; is the command --p-reads-per-batch the same as the --p-chunk-size in the one in the post you just linked (because with the --help command it doesnt appear any option named chunk-size)?

If so, I changed it from default to 1000 and it just took a little longer to send the error message. Also, can you give me a quick explanation about single-threaded mode and how modify it?

I´m gonna try the command on a computer with 16GB for now, I´ll post if something happens

Thanks!

jairideout · February 12, 2018, 5:17pm

The short answer is that 4GB RAM doesn't appear to be enough memory for the command to complete, either in single-job or multi-job mode. My guess is that you're running the command without using multiple jobs (which is the default behavior), so setting a different chunk size / reads-per-batch won't have any effect. I think you'll have better luck with the 16GB RAM computer. Let us know how it goes!

See below for specific answers to your questions -- apologies that my previous post wasn't clear about single vs multiple jobs and chunk size (looks like I led you on a bit of a wild goose chase!).

Note: I was using the word "single-threaded" in my previous post, but classify-sklearn actually uses multi-processing instead of multi-threading. There are technical differences between the two modes of parallelism, but the idea is roughly the same.

Sorry about that, it looks like --p-chunk-size was renamed to --p-reads-per-batch in the QIIME 2 2017.9 release (changelog notes).

You're likely running the command in single-job mode unless you're using --p-n-jobs with a value other than 1. If you're not including --p-n-jobs in the command, it will run in single-job mode by default. If you have a limited amount of memory, you'll want to run the command in single-job mode (in other words, you can omit the --p-n-jobs option altogether).

Single-job mode means that the command will run on a single CPU instead of processing the data in smaller parallel jobs. You can speed up the runtime of the command by specifying -1 to use all CPUs, or a value greater than 1 to use the specified number of CPUs. However, that will use up more memory than running in single-job mode, so it won't help reduce memory requirements.

Let me know if you have any other questions about this, and sorry again for the confusion!

Vixer · February 15, 2018, 9:50pm

Hello again, I managed to get a laptop with 16 Gb of RAM and a i7 processor, I assigned 12Gb to the virtual machine and used a --p-reads-per-batch 1000 and 500 and I´m still getting the error.

Also, I forgot to add that I have 2,777 sequences in my rep-seqs.qza artifact.
Sadly, this is the best computer I´m able to get.

Nicholas_Bokulich · February 16, 2018, 3:06pm

Hi @Vixer,

Unfortunately, SILVA classifiers take a ton of memory and even 12 GB may be insufficient... though reducing reads-per-batch should really make this work (that's how I get SILVA classifiers working on a laptop with 8 GB RAM).

My advice is to try one of the pre-trained Greengenes classifiers instead. These are much less memory-intensive and probably won't even require setting reads-per-batch. If that's still failing, I'd recommend monitoring memory use to see how much is actually being consumed.

Oh, one more thing: DO NOT attempt to run classify-sklearn in parallel. This will load multiple copies of the classifier into memory, depleting resources. Parallel classification is really only feasible/useful on high-performance computing clusters.

If all else fails, give classify-consensus-blast or classify-consensus-vsearch a try instead.

Vixer · February 19, 2018, 3:13pm

Oh, I was running classify-sklearn while using the SilvaDB. Just used the consensus blast with the rep-set99 from the silva 199 for qiime and just worked great!

Thanks a lot!

system · March 22, 2018, 9:13pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.