DADA2 results and number of reads per sample

Based on my understanding, the number of reads in each sample (higher number of reads in the actual sample and lower number of reads in control) affect the quality filtering/OTU picking since the reads are pooled. Since DADA2 filter sequences based on quality of the reads, I just wonder if I can run DADA2 with all the samples (including high and low # of reads) or I need to run DADA2 with high # of reads samples and low # of reads samples separately?


Hi @chuang!

DADA2 doesn’t filter based on quality, rather it corrects based on an error-model derived from quality. That means what’s important is that the reads that were sequenced together (in the same run and lane) are denoised together. As they will all share the same error-profile independent of their individual abundances per-sample.

Conversely, reads that were not sequenced together should not be denoised together as it will confuse the error-model.

You can control how many reads are used to train the error-model with --p-n-reads-learn, although you typically don’t need to worry too much about it.

Let me know if that answers your question.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.