taking so long to do dada2 (9 days)

shanif3 · December 31, 2023, 2:11pm

why does it take so long?
before this run, it ran for a month and i gave up and kill the process and tried again ( see the picture)

thanks!

jwdebelius · December 31, 2023, 8:54pm

Hi @shanif3,

I dont know your data, but my guess is that you sequenced your sample(s) very deeply, and have few of them. IIRC, the way DADA2 trains error is dependent on the number of sequences per sample as well as the number of samples, but deeper sequencing takes more time.

From my perspective, there are four things to consider:

Deblur will probably be faster for your small number of samples
There's very little you can actually learn from 1 sample; even a pilot sample, since you dont have enough samples to construct a distribution.
DADA2 does well running on a full sequencing run, so even if you share a 16S run, you may find denoising more effecient.
For most statistical analysis, more samples are typically better than more sequences. I think 384/run (4 x 96 wells total; this includes my controls) tends to preform best for my 16S work and gives a nice balance between sample depth and sample size. It also lets me control for batch and plate effects in my data, as long as things are randomized well.

Best,
Justine

shanif3 · January 11, 2024, 9:41am

Hi @jwdebelius ,
Thanks for answering!
Unfortunately it's not my samples,data. what do you suggest to do in this case?

jwdebelius · January 11, 2024, 2:36pm

Hi @shanif3,

I would recommend you either wait or build a feature table a different way.

If you didn't expect to have a single sample with 1.3M reads, then hyou may need to step back and check your demultiplexing. If you demuliplexed in QIIME 2 make sure that you have the correct number of samples when you summarize the sequences. If you demultiplexed outside fo QIIME, consult the people who did the demultiplexing and use a manifest import.

The only other possible scenario I can imagine where this might be the case is if you did metagenomic sequencing instead of marker gene (16S, ITS, 18S). The two are not interchangable. If this is not actually a 16S amplicon, then the dada2 might run forever.

So, my recommendations would be:

Verify you should a few very deeply sequenced samples.
- If not, check your demux
- If yes, continue to step 2
Verify the samples are 16S rRNA or another amplicon
- If no, look at metagenomic annotation techniques like shogun, metaphan, etc. There's a new version of qiime2 for metagenomic annotation
- If yes, use deblur or OTU clustering which will be more computationally effecient

Best,
Justine

system · February 11, 2024, 8:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.