I’m having some trouble determining whether it’s okay to leave forward and reverse reads the same length.
My raw reads are ~250nt. Normally I end up trimming them to be the same length because the quality scores are fairly similar in both forward and reverse. But is this going to negatively impact my results?
I just need some help wrapping my head around this!
Hi @Ellenphant,
Since paired-end reads are ultimately merged on the 3’, truncating their length on that end does not matter, as long you don’t truncate too much which disallows for proper merging.
In fact, trimming on the 3’ (often the poor quality tails) is recommended when you’re using DADA2 for denoising since removal of those bad quality tails allows for more reads to pass the initial filtering step.
So for analyses that I have completed with both forward and reverse being kept the same length, it just means that I might have lost more sequences in the filtering? But that the taxonomic identification won’t have been affected?
Hi @Ellenphant,
That’s a good question, there is no general rule for determining trim/truncate lengths, though maybe some general ‘guidelines’ , there are lots of previous topics on the forum on this and I recommend reading through some of those to get a better idea of how to approach it. If your quality scores never drop off (which is rather weird for an Illumina run) then this is less of an issue for you. In general though it is an optimization problem where you want to truncate as much as of both the 3’ tails of your reads as possible without compromising merging. Since the forward reads are often in better shape, we tend to truncate less from the forwards and more from the reverse. There is this pre-print on FIGARO, a tool that tries to solve this optimization issue, I’ve never used it personally but might be of interest.