Why shouldn't I just filter based on quality score and throw away those with low scores?

msport469 · September 10, 2019, 11:25am

Hi,

Using dada2 or deblur is essentially taking out the low quality data. Is there a reason I should use one of these and not just filter based on the quality score? Thanks.

Nicholas_Bokulich · September 10, 2019, 12:55pm

this is a bit of an oversimplification! E.g., dada2 does not only remove low-quality reads, it also aims to correct the errors in reads to recover the true signal, rather than throwing away or trimming a read wherever an error is observed.

Both dada2 and deblur actually use a Q-score-based filter as the first step (dada2 does this automatically, deblur should be preceded by a qiime quality-filter step to perform this filter). This is only a rough filtering step, though, and does not catch (let alone correct!) all errors.

The best evidence for this is really in the literature; I'd advise you to look at the dada2 and deblur papers and other benchmarks that compare these methods vs. QIIME 1 (where the only option was rigorous Q-score-based filtering, essentially step 1 in the dada2/deblur workflows).

Q-score-based filtering is an imprecise tool: either errors creep in to your data or you trim so much data you are left with nothing — see here for a benchmark of Q-score-based filtering to give you an idea that a rough filter alone is not enough.

dada2 and deblur == a fine cup of espresso
q-score filtering == straining the grinds through your teeth

msport469 · September 10, 2019, 2:18pm

I've never seen such a beautiful metaphor in my life wow thank you for that amazing explanation and also that image of coffee grounds in my mouth I'll never ever be able to forget.

system · October 12, 2019, 12:12pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.