The first time I ran the dada2 with 10 samples, I obtained a corresponding amount of ASV for each sample. Surprisingly at the 2nd time, I added 10 other samples, 20 total samples, the number of ASVs of the previous 10 samples changed slightly (1-100 ASV). I don’t understand why? Does this make different analyzes unstable and incomparable between different runs?
Could you please also provide additional information, such as sequencing run of your datasets?
Dada2 process all data sample by sample, so in my understanding the number of samples should not affect each sample output.
But here, it is indicated, that one should process different runs/lanes separately, since running samples from different sequencing runs may affect quality filtering step in Dada2.
UPD from @jwdebelius
Adding more samples, even from the same run, also may slightly affect the output for all samples, since
Dada2 analyzes the per run error and then fits a model based on the available data
Thanks for your quick response. I’ll run different analyses and then merge all the results into a single result. I think that’s a good solution. Can you give me a link to the @jwdebelius post?
If you would like to learn how dada2 processes samples across runs, you could take a look at the full dada2 paper.
Keep in mind that other settings of the q2-dada2 plugin could also change results slightly, like the data used for training the error model, how samples are pooled
--p-pooling-method, and chimera removal
The error profile and chimeras should be specific to each sequencing run, which is why dada2 recommends to process each run separately.