Data file Vs. RAM size in dada2

dasqiime22 · November 4, 2019, 7:39pm

I have fastq.gz files of 12 samples (100 bp per read), approximately 33 GB total. Dataset are generated using WGS.

I can successfully execute dmux.qza and demux.qzv.

qiime dada2 denoise-paired **
--i-demultiplexed-seqs demux.qza **
--p-trim-left-f 0 **
--p-trim-left-r 0 **
--p-trunc-len-f 99 **
--p-trunc-len-r 89 **
--o-table table.qza **
--o-representative-sequences rep-seqs.qza **
--o-denoising-stats denoising-stats.qza

While executing, it failed to run, after 12 hours it shows plugin error.

My computer RAM:32 GB, HD-500GB, I

What RAM size is needed?

colinbrislawn · November 5, 2019, 5:59pm

Good morning Jayanta,

Estimating RAM needed is hard because it depends on both the size and complexity of your data set. But having about as much RAM as your input data set is a good place to start.

One of the settings of the dada2 denoise-paired plugin is --p-n-reads-learn, which is set to 1 million by default. You could lower that to 100,000 or 10,000 to speed up your processing and reduce RAM usage.

(And you could add --p-n-reads-learn 4 to speed up this process too! )

Colin

dasqiime22 · November 5, 2019, 7:26pm

Thank you!
I am checking with the new setting.
One thing I forgot to mention is that I Run through virtual box, and I set for 20GB RAM. My target is bacterial community, but my dataset for whole genome sequencing, is there any problem for setting lower the reads for learning?

colinbrislawn · November 5, 2019, 8:25pm

Nope. 100,000 reads should still be plenty to estimate error profile.

20 GB of VM ram is still probably OK. You can try closing down your web browser and increasing the VM ram even more if needed.