Qiime dada2 denoise - long run time

nataliesilmon · October 3, 2018, 10:58pm

Hi,

I'm running dada2 denoise and it seems to be taking a very long time (>24 hrs so far). I am aware that dada2 is known for taking a long time - but I wanted to check that what I am observing is reasonable. I have 15 million reads (2 x 300 bp paired end) representing 16 samples. So does it make sense that it has taken >24 hrs so far? I'm running on a HPC with 5 threads, allocating 5GB memory.

I'm asking this because we plan to run a lot more samples in future, and I would like to understand what I can expect in terms of timing (if we run for example 100 samples, is it going to take a month to run dada2 using such memory allocation? )

EDIT: What sequencing depth are you using? Is there a way of subsampling reads with QIIME2 before running dada2 denoise?

I would appreciate any advice.

Best,

Natalie

ebolyen · October 5, 2018, 7:33pm

Hey @nataliesilmon,

Yeah that should be fine, you can speed it up with more threads (for example we will use 32 sometimes on a dedicated HPC node and they will finish within 2-ish days), but I suspect it has already completed by the time of my response?

I think 12-16gb is a "safe" amount of memory for basically any job with DADA2, it doesn't need a lot, but you might be a little close to a threshold at 5gb.

Goodness no! The longest I have seen is a week (prior to some optimizations that were made) and that was with ~10gb of compressed data with only 12 or so cores.

We don't have a good way to subsample or partition data in QIIME 2, that's something we could really use though.

nataliesilmon · October 8, 2018, 4:57pm

Thank you very much Evan! It completed after 5 days with 15 million PE reads. In the meantime I subsampled 10k and 100k reads from the fastq files to get an idea - using the tool seqtk.

system · November 8, 2018, 10:57pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.