Data file Vs. RAM size in dada2

I have fastq.gz files of 12 samples (100 bp per read), approximately 33 GB total. Dataset are generated using WGS.

I can successfully execute dmux.qza and demux.qzv.

qiime dada2 denoise-paired **
–i-demultiplexed-seqs demux.qza **
–p-trim-left-f 0 **
–p-trim-left-r 0 **
–p-trunc-len-f 99 **
–p-trunc-len-r 89 **
–o-table table.qza **
–o-representative-sequences rep-seqs.qza **
–o-denoising-stats denoising-stats.qza

While executing, it failed to run, after 12 hours it shows plugin error.

My computer RAM:32 GB, HD-500GB, I

What RAM size is needed?

Good morning Jayanta,

Estimating RAM needed is hard because it depends on both the size and complexity of your data set. But having about as much RAM as your input data set is a good place to start.

One of the settings of the dada2 denoise-paired plugin is --p-n-reads-learn, which is set to 1 million by default. You could lower that to 100,000 or 10,000 to speed up your processing and reduce RAM usage.

(And you could add --p-n-reads-learn 4 to speed up this process too! :smile:)


Thank you!
I am checking with the new setting.
One thing I forgot to mention is that I Run through virtual box, and I set for 20GB RAM. My target is bacterial community, but my dataset for whole genome sequencing, is there any problem for setting lower the reads for learning?

Nope. 100,000 reads should still be plenty to estimate error profile.

20 GB of VM ram is still probably OK. You can try closing down your web browser and increasing the VM ram even more if needed.