Qiime dada2 denoise pairedEnd - long run time

Achraf_Zbaida · September 18, 2024, 1:27pm

I have 96 paired-end patient samples (192 files in total (66 GB)). When I run DADA2, it's been running for 4 days and still hasn't finished. My virtual machine has 32 threads and 128GB of memory.

colinbrislawn · September 18, 2024, 2:05pm

Good morning Achraf,

What commend did you run? How big is the demux.qza file?

Achraf_Zbaida · September 18, 2024, 2:07pm

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_trimmed.qza
--p-trunc-len-r 0 --p-trunc-len-f 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 32

size : 32G

colinbrislawn · September 18, 2024, 2:51pm

Thank you for telling me more.

DADA2 usually runs faster than that, say in a few hours, so this is a little surprising.

If your computer has 128 GB memory, how much has been allocated to your VM?

(You can run the linux command top and post the results here, if you would like)

Achraf_Zbaida · September 19, 2024, 8:59am

it's Amazon EC2 Instance (m5a.8xlarge)

colinbrislawn · September 19, 2024, 1:55pm

That's running right now and using multiple threads!

%CPU is >100 showing multithreaded support
%MEM is <100 showing RAM is available

That's all good. Let it cook!

DADA2 can take awhile

Achraf_Zbaida · September 27, 2024, 2:49pm

It's been two weeks and it's still not finished.

colinbrislawn · September 27, 2024, 4:52pm

Thank you for the update.
Yeah, something is wrong, and I may have found it!

The most recent screenshot you shared shows 'from 1 samples' but at the start you said.

Where have the other 95 samples gone? Did they all get combined during demultiplexing, because that's no good!

Additionally, 66 GB is pretty big for a single Illumina MiSeq run. Were all 96 samples on the same run, or are there multiple sequencing runs here?

We are so close to figuring out the problem!!

Achraf_Zbaida · September 30, 2024, 11:11am

All the samples are in the same run

For me i use just this three commande:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path metadata.tsv
--output-path demux.qza
--input-format PairedEndFastqManifestPhred33V2

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-cores 32
--p-quality-cutoff-3end 30
--p-quality-cutoff-5end 30
--o-trimmed-sequences demux_trimmed.qza

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_trimmed.qza
--p-trunc-len-r 0 --p-trunc-len-f 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 32
--verbose

colinbrislawn · September 30, 2024, 3:25pm

And yet, as you can see, DADA2 only sees 1 sample.

Something has gone wrong!

Can you post the quality score plots from both demux.qza and demux_trimmed.qza?

Achraf_Zbaida · September 30, 2024, 11:17pm

As you see there is 96 samples

this is the graphe for demux.qza

This is the graphe for demux_trimmed

colinbrislawn · October 1, 2024, 2:05pm

Thank you for sharing that! I can see the separate samples, and quality looks okay with binned quality scores.

What amplicon did you sequence and/or what primers did you use?

Can you tell us more about the biological context of these samples?

Achraf_Zbaida · October 1, 2024, 2:46pm

Sequencing of 16S rRNA amplicons (using V3-V4 primers)

colinbrislawn · October 1, 2024, 3:03pm

Okay, that sounds good.

~~Why does DADA2 see only 1 sample?~~

EDIT: The q2-dada2 plugin often uses reads from one sample to build the model, so this is fine. All the . dots . after filtering show many samples have been processed.

Let's see if other folks have better ideas

colinvwood · October 3, 2024, 4:31pm

Hello @Achraf_Zbaida,

Can you attach your demux_trimmed.qzv?

Achraf_Zbaida · October 4, 2024, 11:44am

demux-trimmed-summary.qzv (319.3 KB)

colinvwood · October 4, 2024, 6:00pm

Hello @Achraf_Zbaida,

I don't see anything unusual about your data. This is difficult for me to troubleshoot further without actually having the data to try to reproduce the issue. Perhaps you could host a few samples worth of it somewhere and share the link? Alternatively you could try opening an issue on the DADA2 software's GitHub.

system · November 5, 2024, 12:00am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.