The running time of qiime dada2 denoise-ccs is extremely long, taking approximately one week. Are there any methods to speed it up?
I have 45 samples, and each sample has approximately 250,000 reads.
The following are the parameters I used.
What machine are you running this command on, and how much memory is allocated?
For additional context, dada2 can often take a couple of days to run - but a week is on the longer end of runtime. The more memory you have available to allocate (as well as threads), the better.
Hi @lizgehret ,
I'm running on a HPC and have been allocated 40G of memory.
When I ran the program with 39 samples, each having 50,000 reads, it only took 6 hours. However, when dealing with 45 samples, with each sample having 250,000 reads, it took approximately one week.Therefore, I wonder if there is any way to accelerate it in the case of high depth.
Thanks for following up with those details. The high depth is definitely impacting your run time (when comparing with your shallower group of 39 samples). You could try requesting a node with additional RAM (if this is possible/available on your HPC) to improve the processing power. Here are a few options that could improve your runtime:
Change your pooling method from pseudo to independent. In the pseudo-pooling method, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are then denoised independently a second time (this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs). This takes more time than just each sample being denoised independently - but the tradeoff is a reduction in sensitivity.
Additional sample filtering prior to denoising (if applicable) - depending on what your demux summary reveals, you could add trim/trunc params if there's any reduction in read quality.
Split your samples into a few smaller groups to denoise separately - after which you can merge the resulting feature tables. If you have access to multiple nodes on your HPC you could submit the jobs to denoise these smaller sample groups at the same time, which could significantly improve the total run time.