Comparison of DADA2 with Deblur

This is a completely valid option (which I'm assuming there is a command to do in Q2?). "Cutting a band in silico" to choose amplicons of the expected length is a good way to clean up certain kinds of residual artefacts that can crop up. The DADA2 pipeline does not do this by default because in some cases those non-target length amplicons are real interesting things. For example, Trichomonas vaginalis, the protozoan cause of trichomoniasis, is detected by the EMP primer set, but is ~290 nts rather than 251-255 expected V4 length.

No! There is biological length variation in the V4 region! (as in most amplicons)

V4 is a pretty tight distribution, about 251-255 (EMP primer set) captures almost all the V4 sequences. So pruning to 251-255 is legit, but throwing away all sequences that aren't exactly 253nts will throw away real sequences and some taxa preferentially.

This is probably because of the difference in how errors are handled, DADA2 corrects errors and Deblur removes errors. This difference has a big effect on the total reads that make it to the end, but a much smaller impact on the frequencies, which are more important.

Edit: Should have read the thread first, @tanaes already covered the total reads question!

6 Likes