DADA2-Really low output for 1 library out of 4


I have a quick question, I have processed 4 libraries using the exact same code and dada2 perimeters as suggested in a previous post. For the most part, I have gotten great merging values for 3 libraries, but for one of the libraries, I just cannot manage to get anything good. The table.qza file used to select the dada2 perimeter looks great, or so I think, as it is comparable to the other 3 libraries but yet, dada2 is not giving me great results.

I have also noticed that even though all libraries have a comparable number of samples (+- 96) they all took about 1.35 min to process through DADA2, but when I process the 3rd library, with the exact same parameters, it takes about 5 hrs and 45 minutes to process. I feel like something is off and I have compared the imported files and demuxed files and they seem fine so I am not sure why I am having such a hard time with the DADA2 step for this library.

I am not sure how to proceed. I have tested various parameter on this library, including the parameters I used for the other libraries and everything just takes forever to process and results in low values. Can you please help me in understanding what I could be doing wrong or what I am missing?

I have attached both the table and stats outputs from DADA2 and the trimmed demux file used to select DADA2 parameters.

Fungal-demux-trimmed-3.qzv (304.3 KB)
Fungal-2-Stats-dada2-209-201.qzv (1.2 MB)
Fungal-2-Table-dada2-209-201.qzv (731.7 KB)

Hi @fabipc,
It’s hard to know what your definition of good is or what you are expecting from these runs. So you could simply have a lot less reads in the other runs which take much shorter time perhaps? This run has 15,462,364 reads which can take a while to run, but it does complete without any errors so nothing to worry about. You are also ending up with ~70% of your initial input which is actually great and totally normal. Overall I’d say this is a rather successful run and you have lots of reads per sample to carry on to the next step.
Not sure what the other runs are like so I can’t comment on those, but I don’t see any real issues here.

The # of samples doesn’t matter for run-time so much as # of reads, their length, and their composition.

