Memory error creating classifier using RESCRIPt and Silva 138

bkramer · August 23, 2021, 1:41pm

Hello,

I am using qiime 2 (version 2021.4) in Virtual box and have been following the tutorials in RESCRIPt (Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt) to create a classifier for my 16S dataset.

Unfortunately, whether I go through the process of importing the taxonomy and reference sequence data from Silva 138 or use the two files already processed and the 515/806 primer set already extracted (Data resources — QIIME 2 2021.4.0 documentation, my computer suffers a memory error and cannot continue after several hours, despite dedicating 30 GB of RAM and 6 CPUs to the process, which was sufficient for creating a classifier for the Silva 132 database. To be clear, I'm using this code:

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138-ssu-nr99-seqs-515f-806r-uniq.qza
--i-reference-taxonomy silva-138-ssu-nr99-tax-515f-806r-derep-uniq.qza
--o-classifier silva-138-ssu-nr99-515f-806r-classifier.qza

Is there anything else I can put in this command line to reduce the strain on my computer's memory? Any help would be greatly appreciated!!

SoilRotifer · August 23, 2021, 2:11pm

Hi @bkramer,

A few questions:

How much RAM is your virtual box set up to use? Have you allocated sufficient RAM?

I've been able to construct and train the V4 (515-806) classifier on my laptop with 16 GB RAM... though that was indeed pushing it ! But it also depends on what else your system is doing at the time.

Again, my first thought would be to make sure your virtual box has access to at least 16-24 GB RAM when it is running. The default might only be 2-8 GB RAM ?

bkramer · August 23, 2021, 2:52pm

Hi @SoilRotifer ,

Thank you for responding so quickly. I've allocated as much RAM to 2021.4 as our computer will allow me (30.3 GB), and nothing else aside from Virtualbox and Qiime are open, which is why I'm thinking this problem might be specific only to me, but as I said, these settings were sufficient to create a classifier with version 132 of the Silva database, so I don't know why this one is failing.

Worse case scenario, I can just use the pre-trained classifier available on the qiime 2 website...it's just that I've been told that it's always best to make your own

thermokarst · August 23, 2021, 3:00pm

I don't think its specific only to you - for reference, when we re-train the feature classifiers for new QIIME 2 releases, we have to use 64 GB of RAM on our HPC, using the default "chunk size" setting of 20,000. 30 GB seems insufficient to me to get the job done, esp if you're observing a memory error - one option might be to try cutting the --p-classify--chunk-size param in half, reducing the memory burden (but it will ~double the runtime).

:qiime2:

bkramer · August 23, 2021, 4:33pm

Thank you for clarifying. 64 GB of RAM seems like something not a lot of people have (at least my advisor's lab doesn't). At the very least that should be made clear on the tutorial how much RAM you need for this process, especially if it's the only way you can get the latest Silva databases, and that the --p-classify--chunk size command is necessary if your computer doesn't have that much RAM.

I'm not certain whether it would be better to use the premade classifier online or use the --p-classify--chunk size command, but I'll stick with the former option for now

thermokarst · August 23, 2021, 4:36pm

The RAM requirements are entirely dependent on the reference database you are using to generate the classifier with (and to complicate matters, any trimming/extraction will impact that, as well). Training with a greengenes DB with just a few GB of RAM is common. Unfortunately, its not a "one size fits all" situation.

thermokarst · August 23, 2021, 4:46pm

If you're using the 515f-806r primers (and judging by your first post, it sounds like you are), then using a pre-trained classifier will be identical to training your own, assuming you weren't applying some kind of intermediate filtering or cleanup. Usually we recommend folks train their own to accommodate unique environments, custom databases, or to handle their specific primers used. Sounds like you're using a pretty common setup and can use the pretrained classifiers, confidently.

Keep us posted!

thermokarst · August 23, 2021, 7:11pm

An off-topic reply has been split into a new topic: Not sure if this was incorrectly characterized, or why this would appear after removing chloroplast sequences...

Please keep replies on-topic in the future.

system · September 24, 2021, 1:11am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.