I am about to send samples of extracted DNA out to a lab for library preparation and sequencing. I will be using the 515f-806r primers for the V4 region. I am unsure whether I should use 2x151 or 2x250 paired end sequencing for this region.
The sequencing strategy used by the lab I am working with does not include the primers in the reads. From a previous post it seems like the 2x151 sequencing will be sufficient to merge reads if the primers aren't included in sequencing but I wanted to double check since this work is quite new to me.
As @sixvable pointed out, 2x250 would be best. 2x151 barely gives enough overlap to cover the 515f-806r amplicon reliably… especially if there is any loss of quality at the 3’ ends of the reads. This is a very common issue, and commonly dooms 2x151 V4 runs to be utilized as single-end data.
So 2x250 is a little more expensive, but less risky than 2x151.
@SoilRotifer, since the sequencing that I will be using does not sequence through the primers, as stated in your previous post you shared, it seems like it would be safe to use 2x150. Am I reading that correctly?
Yep! Many of the data sets I’ve worked with are from Argonne National Lab, which consists of output generated from their 2x150 V4 EMP protocol. I’ve not had trouble merging the paired end output from these so far.
The V4 expected amplicon size is 270 bp–387 bp, if you just subtract the two positions, you get 292 nt length, which may vary strain by strain, so there will be some differences. Depending on the requested length of overlap, you may be lucky to have a few pairs that overlap. The "few" may mean a composition-dependent unknown fraction anywhere between 0% and 100%. If you subtract the 2x151 from the above amplicon sizes, you will have a distribution of gaps between 0 and 32 nt or overlaps between 0 and 85 nt. That is in the ideal case when all the last bases survive the quality trimming (if any) before joining.
It means that if you use 2x151nt sequencing and you close your eyes and join the reads, than you will be given a set of joined reads ("no trouble"), a syntactically perfect artifact with an extreme bias toward the short-V4 genera and lose the long-V4 part of the world.
So you'd better carefully check the output of the statistics of the reads before and after the joining (merging) with respect to length distribution and the number of them.
I know it is too late to answer the original question, but ppl may have the same question later, so I just put it here.