Qiime dada2 denoise pairedEnd - long run time

I have 96 paired-end patient samples (192 files in total (66 GB)). When I run DADA2, it's been running for 4 days and still hasn't finished. My virtual machine has 32 threads and 128GB of memory.

Good morning Achraf,

What commend did you run? How big is the demux.qza file?

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_trimmed.qza
--p-trunc-len-r 0 --p-trunc-len-f 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 32

size : 32G

1 Like

Thank you for telling me more.

DADA2 usually runs faster than that, say in a few hours, so this is a little surprising.

If your computer has 128 GB memory, how much has been allocated to your VM?

(You can run the linux command top and post the results here, if you would like)

it's Amazon EC2 Instance (m5a.8xlarge)

1 Like

That's running right now and using multiple threads!

  • %CPU is >100 showing multithreaded support
  • %MEM is <100 showing RAM is available

That's all good. Let it cook! :ramen:

DADA2 can take awhile

It's been two weeks and it's still not finished.

Thank you for the update.
Yeah, something is wrong, and I may have found it!

The most recent screenshot you shared shows 'from 1 samples' but at the start you said.

Where have the other 95 samples gone? Did they all get combined during demultiplexing, because that's no good!

Additionally, 66 GB is pretty big for a single Illumina MiSeq run. Were all 96 samples on the same run, or are there multiple sequencing runs here?

We are so close to figuring out the problem!!

1 Like

All the samples are in the same run

For me i use just this three commande:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path metadata.tsv
--output-path demux.qza
--input-format PairedEndFastqManifestPhred33V2

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-cores 32
--p-quality-cutoff-3end 30
--p-quality-cutoff-5end 30
--o-trimmed-sequences demux_trimmed.qza

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_trimmed.qza
--p-trunc-len-r 0 --p-trunc-len-f 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 32
--verbose

And yet, as you can see, DADA2 only sees 1 sample.

Something has gone wrong!

Can you post the quality score plots from both demux.qza and demux_trimmed.qza?

As you see there is 96 samples

this is the graphe for demux.qza

This is the graphe for demux_trimmed

1 Like

Thank you for sharing that! I can see the separate samples, and quality looks okay with binned quality scores.

What amplicon did you sequence and/or what primers did you use?

Can you tell us more about the biological context of these samples? :petri_dish:

Sequencing of 16S rRNA amplicons (using V3-V4 primers)

1 Like

Okay, that sounds good.

Why does DADA2 see only 1 sample?
:thinking:

I'm totally out of ideas!

Let's see if other folks have better ideas

Hello @Achraf_Zbaida,

Can you attach your demux_trimmed.qzv?

1 Like

demux-trimmed-summary.qzv (319.3 KB)

Hello @Achraf_Zbaida,

I don't see anything unusual about your data. This is difficult for me to troubleshoot further without actually having the data to try to reproduce the issue. Perhaps you could host a few samples worth of it somewhere and share the link? Alternatively you could try opening an issue on the DADA2 software's GitHub.