Why do the DADA2 default setting have such a low PHRED score?

benjjneb · June 12, 2019, 12:28pm

Hi Sam, this is because the primary and recommended quality filtering parameter in DADA2 is "expected errors" (max-ee), which is based on the quality scores but is a better filter than averaging raw quality scores. You can read more about EE filtering here: https://doi.org/10.1093/bioinformatics/btv401

The quality truncation at q-score 2 is really just for older Illumina software where a score of 2 was code for "I don't know what's going on anymore" and any bases after the first 2 often were poor. These days, it's basically superfluous in most cases, and I'd recommend using max-ee as the quality filter in almost all cases, in conjunction with trunc-len to truncate off low quality suquence tails.