DADA2 running forever


I am running some amplicon libraries from NovaSeq6000 SP 2X250bp. First I tried with one sample, and dada2 finished within two hours. Then I tried with 2 samples, then DADA2 kept running for several days and seems will never finish. This was running on our university's cluster, which should have very high memory. Below is the command I used, and the showing of DADA2. Can you please help? Thanks

qiime dada2 denoise-paired --i-demultiplexed-seqs trimmed-seqs.qza --o-table table.qza --o-representative-sequences rep-seqs-dada2.qza --o-denoising-stats dada2-stats.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 240 --p-trunc-len-r 230 --verbose

Loading required package: Rcpp
DADA2: 1.18.0 / Rcpp: 1.0.6 / RcppParallel: 5.1.2

  1. Filtering ..
  2. Learning Error Rates
    834862080 total bases in 3478592 reads from 1 samples will be used for learning the error rates.

Hi @feixiang1209,

A couple of things come to mind here. NovaSeq runs are typically massive in size and it looks like as though you are not utilizing dada2's capabilities in running your task in parallel, so I'm not surprised this is taking a very long time. You can assign multiple cores to this task this with the --p-n-threads flag. That being said, you probably don't want to be running data from newer Illumina machines such as the NovaSeq with the q2-dada plugin. There's a bit of this topic discussed here and here. On the DADA2 github page it is mentioned that the next release will be able to handle the new binned quality scores but not sure when that is scheduled for. See the other threads about ways of using Deblur instead for this task.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.