I´m testing my reads using the pre-trained classifier “Silva 119 99% OTUs from 515F/806R region of sequences” since my primers are seem to work with it (f: GTGCCAGCMGCCGCGGTAA & r: GGACTACVSGGGTATCTAAT) and after few minutes into the process i get an error saying Memory Error. Then I use the “Greengenes 13_8 99% OTUs from 515F/806R region of sequences” and the process runs just fine.
Is the error caused because of the size of the Silva file and is there a way to fix this error (I´m running qiime2 2017.11 in a virtual machine with 3 gigs RAM assigned to it, the computer has 8g of RAM).
Hi @Vixer! Take a look at @BenKaehler’s post for details about pre-trained Silva classifier memory requirements, as well as some options you can try to reduce the memory requirements. It appears that the pre-trained Silva classifiers use a maximum of 11GB memory.
My hunch is that you’ll need to either allocate more memory to your virtual machine, or run the command on a computer with more memory – different chunk sizes and/or single-threaded mode will likely not work if you’re limited to 3GB RAM. Thanks!
Thanks for asnwering. I uptaded to the lastest qiime2 version and allocated more RAM (now 4GB) and looking at the options; is the command --p-reads-per-batch the same as the --p-chunk-size in the one in the post you just linked (because with the --help command it doesnt appear any option named chunk-size)?
If so, I changed it from default to 1000 and it just took a little longer to send the error message. Also, can you give me a quick explanation about single-threaded mode and how modify it?
I´m gonna try the command on a computer with 16GB for now, I´ll post if something happens
The short answer is that 4GB RAM doesn’t appear to be enough memory for the command to complete, either in single-job or multi-job mode. My guess is that you’re running the command without using multiple jobs (which is the default behavior), so setting a different chunk size / reads-per-batch won’t have any effect. I think you’ll have better luck with the 16GB RAM computer. Let us know how it goes!
See below for specific answers to your questions – apologies that my previous post wasn’t clear about single vs multiple jobs and chunk size (looks like I led you on a bit of a wild goose chase!).
Note: I was using the word “single-threaded” in my previous post, but classify-sklearn actually uses multi-processing instead of multi-threading. There are technical differences between the two modes of parallelism, but the idea is roughly the same.
Sorry about that, it looks like --p-chunk-size was renamed to --p-reads-per-batch in the QIIME 2 2017.9 release (changelog notes).
You’re likely running the command in single-job mode unless you’re using --p-n-jobs with a value other than 1. If you’re not including --p-n-jobs in the command, it will run in single-job mode by default. If you have a limited amount of memory, you’ll want to run the command in single-job mode (in other words, you can omit the --p-n-jobs option altogether).
Single-job mode means that the command will run on a single CPU instead of processing the data in smaller parallel jobs. You can speed up the runtime of the command by specifying -1 to use all CPUs, or a value greater than 1 to use the specified number of CPUs. However, that will use up more memory than running in single-job mode, so it won’t help reduce memory requirements.
Let me know if you have any other questions about this, and sorry again for the confusion!
Unfortunately, SILVA classifiers take a ton of memory and even 12 GB may be insufficient… though reducing reads-per-batch should really make this work (that’s how I get SILVA classifiers working on a laptop with 8 GB RAM).
My advice is to try one of the pre-trained Greengenes classifiers instead. These are much less memory-intensive and probably won’t even require setting reads-per-batch. If that’s still failing, I’d recommend monitoring memory use to see how much is actually being consumed.
Oh, one more thing: DO NOT attempt to run classify-sklearn in parallel. This will load multiple copies of the classifier into memory, depleting resources. Parallel classification is really only feasible/useful on high-performance computing clusters.
If all else fails, give classify-consensus-blast or classify-consensus-vsearch a try instead.