Estimated time for DADA2 process

Hi,
I’m running DADA2 in qiime2 now,
V3-V4 Paired-end, total over 1500 samples, around 50G size of sequence files (Foward+Reverse).
I started almost two weeks ago but it’s still running, and I don’t know when it will end.
so here are questions,

  1. in the moment I’m giving the option of maxee 2 and trunc-q 0 with threads 36. will these options influence a lot to the ending time? I’m doing maxee 1 and trunk-q 2 in the other computer though. anyway both are still running.
  2. and since almost one week, they are removing chimeras, which step doesn’t show how far it’s going. so is there any way to see how this chimera checking step is going? since it’s quite boring to watch the one step without any reports.
  3. do you have any idea when my processes will end?

Anyway so far I really satisfied with the concept of DADA2 for 16S data, especially with qiime2. Thanks for keeping this open forum!

Best,
Yeojun

1 Like

Hi @yeojuny,
Sorry to hear that this is taking so long for you, something doesn’t seem right.

Can you tell if the job is using CPU, or if it’s just hanging? You should be able to run the command top to determine what the current CPU utilization is on the system where this is running.

2 Likes

Could you also clarify which version of QIIIME2 you are using? Attaching the standard output or log file might also be helpful, as well as your compute environment.

I’m a bit confused as the chimera step should be much faster than the sample inference step. Two guesses are that you are using an older version of Q2 before we added multithreading to chimera removal, or that it is hanging/stalling on that step for memory reasons (chimera removal loads the sequence-variant table into memory, and w/ 50GB of input data the table will be pretty big).

PS: The lack of progress reporting is a current limitation of the underlying dada2 package, and you can follow future progress on that here: https://github.com/benjjneb/dada2/issues/215

1 Like

yes I always checked ‘top’, and it’s sure it’s running. Using many R jobs. So in my opinion, a chimera job with threads is working. My qiime2 version is 2017.4.
Since I’m moving in the moment, if I could get the log file, I will attach it soon.

sorry for late updating my status.
actually one process with maxee 2 and truncn-q 2 options was finished yesterday (it took total 16 days, the other process with maxee 2 and trunc-q 0 is still running at day 18 now.), and I wanted to check the output files today. Two output files look ok with 5.8M table.qza and 17.4M rep-seqs.qza. but the log file showed some error message in the final step (I will attach the log file B2840_mx2_tq2.txt (8.8 KB)
.)
now I'm trying to convert them to visualization files. But the converting table.qza to table.qzv showed an error message with "core dumped", and I found the solution here (Error in qiime feature-table summarize - #2 by thermokarst), so it's running well. but it seems to take long again (so far 40min passed).
The computing environment are shown below.

$ vi /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping : 1
microcode : 0xb00001e
cpu MHz : 1200.289
cache size : 30720 KB
physical id : 0
siblings : 24
core id : 0
cpu cores : 12
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes

$ vi /proc/meminfo
MemTotal: 230828360 kB
MemFree: 223588644 kB
MemAvailable: 226649432 kB
Buffers: 20 kB
Cached: 673396 kB
SwapCached: 9212 kB
Active: 2051348 kB
Inactive: 467448 kB
Active(anon): 1873836 kB
Inactive(anon): 41176 kB
Active(file): 177512 kB
Inactive(file): 426272 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 16777212 kB
SwapFree: 16691368 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 1838964 kB
Mapped: 61120 kB
Shmem: 69492 kB
Slab: 2986660 kB
SReclaimable: 2794888 kB

Thanks Yeojun. That would seem to be enough memory to avoid hanging in the chimera stage.

I suspect the length of your run-time is related to a recent issue we discovered at STAMPS, that conda-installed dada2 is much slower than it should be. Progress on that bug should appear here: https://github.com/qiime2/q2-dada2/issues/74

1 Like

So do you recommend to run DADA2 outside of Qiime2 in the case of big data?

That’s an option for now. We are actively exploring the conda installation issue so hopefully it will be fixed soon though.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

QIIME 2 2017.12 has been released and uses DADA2 1.6 which has explicit SSE vectorization for much better performance!

1 Like