Sliding windows and quality filtering

Aaron_DeVries · January 19, 2024, 9:43pm

In order to accommodate the sequenced-length variations that occur with ITS1 reads, I've experimented with the --p-trunc-q option, but found that it performed roughly 5-10% less well than simply truncating all of the reads by length. Since the truncQ method truncates the read at the first instance of a poor quality score, it seemed likely that some reads were being cut too short due to a single bad base in an otherwise high-quality region. Therefore I've been wondering if it would be possible to average the quality scores with a sliding window before truncation. This possibility was indirectly referenced in an old post in this forum, but I couldn't find any mention of it in benjjneb/dada2 documentation, so it wasn't clear if sliding windows were already in use, or how the size of the window might influence the results. 3bp? 5bp?

colinbrislawn · January 20, 2024, 3:44am

Hello Aaron,

Yes, there are a few programs that trim using a sliding window! See this thread on trimmomatic or maybe this fork of sickle.

In case you have not found these already, here's the DADA2 ITS pipeline workflow in R and the Qiime2 Fungal ITS tutorial.

It looks like they both use cutadapt to remove adapters instead of trimming by length or q-score. Interesting!

Nicholas_Bokulich · January 20, 2024, 11:52am

Hi @Aaron_DeVries hi @colinbrislawn

The QIIME 2 plugin q2-quality-filter also already has such an action. It does not use an average across a sliding window, but you can set a window size for the number of low-quality nucleotides that must occur in a row for trimming to be done at that position.

See qiime quality-filter q-score --help for more details.

So it should be possible to trim with this and then pass to q2-dada2. However, note that dada2 can experience lowered performance when amplicons are of different lengths, as this can impact the error modeling step, so this is possible but you should closely check the results. Some relevant posts about this: