Number of ASVs changes when adding some new samples in a difference analysis?

The first time I ran the dada2 with 10 samples, I obtained a corresponding amount of ASV for each sample. Surprisingly at the 2nd time, I added 10 other samples, 20 total samples, the number of ASVs of the previous 10 samples changed slightly (1-100 ASV). I don’t understand why? Does this make different analyzes unstable and incomparable between different runs?

Could you please also provide additional information, such as sequencing run of your datasets?
Dada2 process all data sample by sample, so in my understanding the number of samples should not affect each sample output.
But here, it is indicated, that one should process different runs/lanes separately, since running samples from different sequencing runs may affect quality filtering step in Dada2.

UPD from @jwdebelius
Adding more samples, even from the same run, also may slightly affect the output for all samples, since

Dada2 analyzes the per run error and then fits a model based on the available data


Thanks for your quick response. I’ll run different analyses and then merge all the results into a single result. I think that’s a good solution. Can you give me a link to the @jwdebelius post?


If you would like to learn how dada2 processes samples across runs, you could take a look at the full dada2 paper.

Keep in mind that other settings of the q2-dada2 plugin could also change results slightly, like the data used for training the error model, how samples are pooled --p-pooling-method, and chimera removal --p-chimera-method.

The error profile and chimeras should be specific to each sequencing run, which is why dada2 recommends to process each run separately.

With this in mind, should we only run Dada2 on a full run?

For example, if two projects shared a sequencing run but the data produced were unrelated, should the denoising step include both projects? Or should each project exclude the others’ samples before denoising?

Hi @smayne11 !
No, I think it is better to split and process separately different unrelated projects since in that way you have more options to improve output data by applying specific parameters at different steps of the workflow.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.