Why shouldn't I just filter based on quality score and throw away those with low scores?


Using dada2 or deblur is essentially taking out the low quality data. Is there a reason I should use one of these and not just filter based on the quality score? Thanks.

this is a bit of an oversimplification! E.g., dada2 does not only remove low-quality reads, it also aims to correct the errors in reads to recover the true signal, rather than throwing away or trimming a read wherever an error is observed.

Both dada2 and deblur actually use a Q-score-based filter as the first step (dada2 does this automatically, deblur should be preceded by a qiime quality-filter step to perform this filter). This is only a rough filtering step, though, and does not catch (let alone correct!) all errors.

The best evidence for this is really in the literature; I’d advise you to look at the dada2 and deblur papers and other benchmarks that compare these methods vs. QIIME 1 (where the only option was rigorous Q-score-based filtering, essentially step 1 in the dada2/deblur workflows).

Q-score-based filtering is an imprecise tool: either errors creep in to your data or you trim so much data you are left with nothing — see here for a benchmark of Q-score-based filtering to give you an idea that a rough filter alone is not enough.

dada2 and deblur == a fine cup of espresso :coffee:
q-score filtering == straining the grinds through your teeth :cowboy_hat_face:


I’ve never seen such a beautiful metaphor in my life wow thank you for that amazing explanation and also that image of coffee grounds in my mouth I’ll never ever be able to forget.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.