Merging paired-end reads

Mehrdad · March 4, 2019, 11:57pm

I have received paired-end reads from HiSeq Illumina platform. The PCR amplicon was 480 bp in length, 2*240.

In merging by DADA2, how will they have overlapping while they have no bases in common? I am just curious to learn what is the concept of overlapping in merging two reads(R and F) each other when they have been read 50:50.

For example, In my case PCR amplicon is 480 bp. It means 240 bp read from forward side and 240 bp from reverse side. It does not make sense to have a overlapping region there. This issue is not only for me, but also It think it is applied for everybody's data; otherwise, there is a special explanation that I an unaware.

Please enlighten me!

Thanks

Mehrbod_Estaki · March 5, 2019, 2:42am

Hi @Mehrdad,

How are you determining your amplicon length and 2x240 setup? Are you sure the run you are referring to is in fact 2x240bp? Most of the runs I see from Illumina machines are 2x250bp or 2x300 based on the chemistry kit you use. Is it possible that you actually had a 2x300 bp runs with an expected amplicon size of 480bp? Which means there would be 120bp overlap.
If in fact for some reason you had a 2x240bp run somehow and your expected amplicon size is indeed 480bp then you are correct in that there is no overlap.
q2-dada2 does not allow for merging reads without any overlap. The stand-alone DADA2 in R does have an option for this but it is not advised. My personal opinion on the matter is also that if you have no overlap, just use the forward reads. Without proper merging you are very prone to spurious sequences. Most paired-end runs will certainly consider the need for proper overlap in their design so I would disagree with:

Just double check with your facility/machine to see the exact run specificiations as well as double check your primer designs and targeted amplicon lengths.