I’m running dada2 denoise and it seems to be taking a very long time (>24 hrs so far). I am aware that dada2 is known for taking a long time - but I wanted to check that what I am observing is reasonable. I have 15 million reads (2 x 300 bp paired end) representing 16 samples. So does it make sense that it has taken >24 hrs so far? I’m running on a HPC with 5 threads, allocating 5GB memory.
I’m asking this because we plan to run a lot more samples in future, and I would like to understand what I can expect in terms of timing (if we run for example 100 samples, is it going to take a month to run dada2 using such memory allocation? )
EDIT: What sequencing depth are you using? Is there a way of subsampling reads with QIIME2 before running dada2 denoise?
I would appreciate any advice.
Yeah that should be fine, you can speed it up with more threads (for example we will use 32 sometimes on a dedicated HPC node and they will finish within 2-ish days), but I suspect it has already completed by the time of my response?
I think 12-16gb is a “safe” amount of memory for basically any job with DADA2, it doesn’t need a lot, but you might be a little close to a threshold at 5gb.
Goodness no! The longest I have seen is a week (prior to some optimizations that were made) and that was with ~10gb of compressed data with only 12 or so cores.
We don’t have a good way to subsample or partition data in QIIME 2, that’s something we could really use though.
Thank you very much Evan! It completed after 5 days with 15 million PE reads. In the meantime I subsampled 10k and 100k reads from the fastq files to get an idea - using the tool seqtk.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.