DADA2 pooling option

I am doing a meta-analysis of 16S sequences from different studies. Currently I am using the q2-dada2 tables generated from the different studies and merging them together. I read that DADA2 now has a pooling option in which 16S sequences from different studies are pooled together to improve sensitivity to rare ASVs. Will the option be available soon in the QIIME 2 implementation of DADA2? Is my only option right now, to use the standalone DADA2 in R?

Hi @ange,
I think the ‘pooling’ option you are referring to has a different function, see here for its proper use. Basically, the existing default in q2-dada2 does not use this option, meaning samples are denoised separately and singletons removed in each sample. With the pooled or pseudo option, information is shared across samples (in the same run) and so increases the sensitivity to those rare features. This pooling option is not implemented in q2’s version of dada2 yet, but fingers crossed should be available for the next release.

If you are using different studies/runs you should still be running them through dada2 separately (with identical trim/trunc parameters) and then merge afterwards.

Ah well, I guess that means I need it more than I thought I did…

But if these samples from different runs are biologically similar in some way shouldn’t information be shared across these runs too?

1 Like

Hi @ange,

Only if singletons or very rare features are critical for your experiment. For most studies the default non-pooling option is more than sufficient, the denoising algorithm still does an excellent job and you do still identify most rare features. You should also be aware that with the pooling option the likelihood of spurious (or false positive) ASVs do increase a bit and the run time increases significantly (almost double with the pool and a bit less with the pseudo option. EDIT: the time estimates here are actually in reality even higher as sample #s and community complexity increases)

Yes and no. While the information across similar samples (from different runs) MAY be useful, the run-specific bias is too great and overshadows any benefits that may come from this. This effect is even more exagerrated if the samples have gone through separate DNA extraction and PCRs. The error model created by dada2 performs best when all of those variables are consistent across samples, thus why you should denoise them separately then merge them after for downstream analysis.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.