I am running dada2 denoise and i have total 84 paired end sample and each sample contains average of 1million reads. How much time it would take to finish .
The command i have used is
qiime dada2 denoise-paired
--i-demultiplexed-seqs sample.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 0
--p-trunc-len-r 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--p-n-threads 20
--o-denoising-stats dada2-stats.qza
I don't know how long it will take for dada2 to finish. 1 million reads per sample is a lot, and I would subsample samples before running dada2. Something around 10% should be enough...
Hi,
This is the verbose output is everything look fine?
R version 4.3.3 (2024-02-29)
Loading required package: Rcpp
DADA2: 1.30.0 / Rcpp: 1.0.12 / RcppParallel: 5.1.6
2) Filtering ..........................................
3) Learning Error Rates
318498129 total bases in 1073064 reads from 1 samples will be used for learning the error rates.
303974019 total bases in 1073064 reads from 1 samples will be used for learning the error rates.
3) Denoise samples .........................................
So I would either wait or abort the run, subsample samples to fraction 0.1 (or another, depending on the samples overview), and rerun Dada2. I prefer subsampling since 1 million reads per sample is a lot and will slow down other analyses. Moreover, it may happen that you will not be happy with Dada2's output based on the parameters you used and decide to rerun it with other settings. Then you will be waiting again...
So finally it finished.
I want to know Subsampling samples to a fraction of 0.1 in the context of DADA2 means that you're instructing the tool to randomly select 10% of the total reads from each sample for processing, rather than using the full dataset. Am I right?"
That is correct, but in that case one should subsample samples before Dada2, so dada2 is not instructed to subsample, but work with already subsampled dataset.