RE: Trim based on quality score not the sequence position

I am a newbie around here, so forgive me for the basic question.

Here on this post I mentioned, you say we should consider trimming when median Q drops below 20. But which one is the median Q on the parametric seven-number summary table? Is it the 25th or the 50th percentile (or neither)?

Also, I often see the terms "trim" and "truncate". What are the differences between them?

Hi @microbiotaphyto,

Thanks for reaching out! :qiime2:

Great question! The median quality score corresponds to the 50th percentile in the parametric seven-number summary table. It should be annotated as the median as well, for reference - here is a screenshot from an example interactive quality plot and summary table that demonstrates this:

Truncation refers to the 3' end of your reads, while trimming refers to the 5' end.

When selecting a truncation length, any reads shorter than that length will be discarded - and the remaining reads will be shortened to the truncation length (at the 3' end).

When selecting a trim length, this refers to the number of base pairs to remove from the 5' end (and this occurs after truncation has been performed).

Hope this helps!

Cheers :lizard:


Thank you @lizgehret ! So "median Q" is not different from "median", is that it?

@microbiotaphyto that's correct! Median Q refers to the median quality score.

Cheers :lizard:

1 Like

Thank you @lizgehret !!