DADA2: one sample VS multiple samples

Jay · May 30, 2022, 11:56am

Hello

I thought that DADA2 (in QIIME2) analyze each sample seperately (with default hyperparameters). However, when I analyzed the "sample A" only, the result was different from the result when I analyzed the "sample A" with other samples.

I checked dada2 status and found that values in the 'merged' and 'non-chimeric' columns were dfferent between those two results (merged values were different by 1, and non-chimeric values were different by ~2000).

So, I checked the dada2 document, and found that using --p-chimera-method "consensus" (default), "chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed". Is that a reason why I got the different result when I analyzed sample A with other samples?

Also, the result from analyzing sample A with other samples gave me less non-chimeric sequences. However, I got more sequences assigned to Prevotella (vsearch, top-hit-only) than analyzing sample A only. How could it happen? The result was quite opposite as I expected, because I thought that all sequences from analyzing sample A with other samples were included in the sequences from analyzing sample A only (just removing more chimeric sequences)...

Thanks for reading!

colinbrislawn · May 30, 2022, 1:09pm

Yes!

Other settings could explain this too:

There's --p-pooling-method, which is independent by default but could be changed on your run.

There's the sample of all reads used to train the error model. This will change based on input data, and can vary results (hopefully slightly, but still somewhat).

Try rerunning without dada2 chimera checking, and see how SampleA changes with and without other samples. You are not the first person to note this and we can investigate more!

system · July 1, 2022, 12:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.