I have 96 paired-end patient samples (192 files in total (66 GB)). When I run DADA2, it's been running for 4 days and still hasn't finished. My virtual machine has 32 threads and 128GB of memory.
Good morning Achraf,
What commend did you run? How big is the demux.qza
file?
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_trimmed.qza
--p-trunc-len-r 0 --p-trunc-len-f 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 32
size : 32G
Thank you for telling me more.
DADA2 usually runs faster than that, say in a few hours, so this is a little surprising.
If your computer has 128 GB memory, how much has been allocated to your VM?
(You can run the linux command top
and post the results here, if you would like)
That's running right now and using multiple threads!
- %CPU is >100 showing multithreaded support
- %MEM is <100 showing RAM is available
That's all good. Let it cook!
DADA2 can take awhile
Thank you for the update.
Yeah, something is wrong, and I may have found it!
The most recent screenshot you shared shows 'from 1 samples' but at the start you said.
Where have the other 95 samples gone? Did they all get combined during demultiplexing, because that's no good!
Additionally, 66 GB is pretty big for a single Illumina MiSeq run. Were all 96 samples on the same run, or are there multiple sequencing runs here?
We are so close to figuring out the problem!!
All the samples are in the same run
For me i use just this three commande:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path metadata.tsv
--output-path demux.qza
--input-format PairedEndFastqManifestPhred33V2
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-cores 32
--p-quality-cutoff-3end 30
--p-quality-cutoff-5end 30
--o-trimmed-sequences demux_trimmed.qza
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_trimmed.qza
--p-trunc-len-r 0 --p-trunc-len-f 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 32
--verbose
And yet, as you can see, DADA2 only sees 1 sample.
Something has gone wrong!
Can you post the quality score plots from both demux.qza
and demux_trimmed.qza
?
As you see there is 96 samples
this is the graphe for demux.qza
This is the graphe for demux_trimmed
Thank you for sharing that! I can see the separate samples, and quality looks okay with binned quality scores.
What amplicon did you sequence and/or what primers did you use?
Can you tell us more about the biological context of these samples?
Sequencing of 16S rRNA amplicons (using V3-V4 primers)
Okay, that sounds good.
Why does DADA2 see only 1 sample?
I'm totally out of ideas!
Let's see if other folks have better ideas
Hello @Achraf_Zbaida,
Can you attach your demux_trimmed.qzv
?
demux-trimmed-summary.qzv (319.3 KB)
Hello @Achraf_Zbaida,
I don't see anything unusual about your data. This is difficult for me to troubleshoot further without actually having the data to try to reproduce the issue. Perhaps you could host a few samples worth of it somewhere and share the link? Alternatively you could try opening an issue on the DADA2 software's GitHub.