Trim paired sequences with dada2

Hi someone can help to choose were to trim my paired end sequences? Thanks in advance

Hi @lisadim,

Could you give us a bit more information regarding your design, specifically what is your target region, what are the primers you’ve used, the expected amplicon length, and what is the estimated overlap region of this primer set.

Sorry about that, here there are the illumina primers:
index illumina:
Sample_Name ---------- I7_Index_ID -------- indexI -------- 5_Index_ID------- index2
24143_19------------------ N701 ----------- TAAGGCGA --------- S502 -------- CTCTCTAT
9585-37-------------------- N701 ----------- TAAGGCGA ------ S506 ----------- ACTGCATA

Miseq V3 and V4 16S region.
I hope this can help. Thank you

1 Like

Thanks @lisadim,

So these are the common 805R/341F primers which give an estimated ~464 bp amplicons size which on a 2x300 cycle run should have just about ~140 bp overlap. For proper merging of paired reads following denoising we need a minimum of 20 bp overlap + some wiggle room for natural size variation of this region. Basically we want to remove as much of the poor quality reads as we can up to ~110bp.
Based your quality plots I’d say a good starting point would be to truncate your Forward reads at the ~260-270 position and your Reverse reads at the ~240 position. In addition you should trim about 25-30bp from the 3 5’ of your Forward reads to get rid of that initial dip that may lead to discarding of some reads.
Let us know if you run into any issues.

Thank you, It was so helpful!

1 Like

Hello, Estaki.
I saw this post. But I’m still confused and have a question.

You said to her to trim forward 260-270, reverse 240, and forward 5’ 25-30bp.
I really want to know what the basis of the conclusion came from.

I have to run my samples, but even if I refer to this posting, I cannot catch how to apply it in my case because my quality summaries are slightly different. If I know how to apply it, I think I need to know on what basis your decision came from, reasons or evidences like “it had to be truncated at 260-270 in Forward reads because the quality starts dropping under 30 at that sites.”(This is just nothing but my guess and an example.)

I’m looking forward to getting to know your opinions.
FYI, in my case,
V3-V4 16S amplicon, 341F/805R, same as hers.

Plus, You said “Basically we want to remove as much of the poor quality reads as we can up to ~110bp.”
What does it exactly mean? Does it mean to cut 110bp from the back or by 110bp from 5’?(300-110=190bp or just 110bp?) I think the previous one is right and if it’s really right, will there be problems if I remove more than 110 bp?


1 Like

Hi @Seok_Jun_Kim,

Great question! And it might be worth writing up a mini guide on this topic, but unfortunately there is no correct answer. Myself, and others have reviewed this topic and the process behind selection extensively on this forum so you might benefit from searching some of those topics for more in depth reasoning. The below is my approach, which might be a bit different than others’.
My estimated calculation above determined that in order to have adequate overlap merging we can’t truncate more than ~110 bp from the 3’. This is the total combined sequences we can trim at the 3’ of the forward and reverse reads, as in, you can truncate at 100bp from one and 10 from the other, or any combination of that. Since the forward reads above were in much better shape than the reverse reads - as is the case usually with Illumina runs - I suggested truncating at the 260-270 position since this is about when the quality starts to dip. Even though the q-scores are still pretty decent after 270, it’s likely we would retain more reads by not including the low quality sequences of that tail. The 25-30bp at the beginning (5’) would also be better off trimmed because if too many consecutive low scores appear on a read they will be discarded, even though the rest of the read might look fine, so its better to not risk that and just trim those. Since the merging occurs at the 3’, the trimming at the 5’ doesn’t get included in our 110bp limit.
With the reverse reads since the quality starts to dip way earlier on, we want to truncate earlier as well. In this case, even though we could probably include a bit more than the 240bp position,I figured why risk including poor sequences when we have enough overlap to get rid of the bad ones. The 5’ looks good on the reverse reads so no need to trim from the beginning of that.

A minimum starting point which is sometimes mentioned on the forum is to truncate at the position where the median quality scores dips below 20. Recently, I’ve been personally using more stringent truncating parameters than this if I have enough reads and can afford to lose some but it’s a good starting point.

If you truncate too much that there isn’t sufficient overlap for proper merging you will end up losing most of your reads since reads that fail to merge are discarded.

Hope this clarifies things a bit!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.