Deblur vs DADA2 Questions

benjjneb · December 8, 2017, 4:52pm

The biggest problem arises from the ambiguous nucleotides in many primers. For example, the 515F primer is GTGYCAGCMGCCGCGGTAA. The two ambiguous nucleotides (Y=C or T, M=A or C) show up in equal proportions (technically there are a mixture of the 4 possible primers used).

As a result, if reads have the 515F primer on them, each real biological sequence will show up in a 25/25/25/25% mixture of the 4 possible primer sequences + real biological sequence. This is very bad when using ASV methods, as they will distinguish those differences and call 4 types for every 1 real type!

It's less of an issue if there are no ambiguous nucleotides, but still, primer/adapter isn't DNA from the sequenced organism. That why it is a plus for methods (e.g. EMP) that don't sequence the adapters/primers/etc, as they don't waste bases on non-bio DNA.