Looking for advice - sequence loss in DADA2 merge

ChrisKeefe · January 7, 2019, 9:32pm

@toobiwankenobi, the table in your original post loosely describes the process to which DADA2 subjects sequences, in chronological order. You appear to be losing most of your sequences during the merge phase, not during filtering.

Allowing more error in your sequences may increase the number of sequences, but that probably won't address the larger merge-related loss shown by your table.

Your sequence runs may be 2x300, but you're only handing DADA2 448 base pairs to work with during the merge. (251-17 + 234-20 = 448) If your amplicon is 420 bp long, and you need 20 bp of overlap to complete the merge, that leaves you only 8 bp of wiggle room.

If there is significant variability in the length of your ASVs, there is a chance that short sequences might be systematically removed in the merge process, decreasing sequence count and introducing bias against short ASVs.

Experimenting with more generous truncation parameters may take care of the issue. If it doesn't (e.g. because more-generous parameters introduce too much error into your sequences), you may have to pursue other denoising strategies, or even analyze your sequences as unmerged single-end reads. There are many posts here on the forum describing these options, should you need them. I'm not recommending this at this time - I just want you to know that other paths exist if you can't make this work by adjusting parameters.

"merged" is the number of merged reads. As discussed here, your table's "input" figure is the number of pairs; any decrease in sequence count is loss: