I find that when using the DADA2 plugin to process variable length regions like V3/V4 or V4/V5, the overall retention is low, around 30%. This is mainly due to the mergePairs step because the maxMismatch is 0 and cannot be changed. Even aggressively truncating to limit the overlap region for the longer amplicon sequences, the overlap region for the shorter ones can be more than 50 bp; not allowing any mismatches over so long an overlap length is simply too stringent. There is also the possibility of getting overhangs, or what Robert Edgar calls staggered pairs. This could be prevented by exposing the trimOverhang argument so that it can be set to TRUE.
3 Likes
Good idea, John!
I've opened an issue here: ENH: denoise-paired expose `maxMismatch` and `trimOverhang` · Issue #179 · qiime2/q2-dada2 · GitHub
Welcome to the forums!