Paired end: 2x151 vs 2x250 for V4

Mike_McFarlin · November 9, 2020, 8:36am

Hi everyone,

I am about to send samples of extracted DNA out to a lab for library preparation and sequencing. I will be using the 515f-806r primers for the V4 region. I am unsure whether I should use 2x151 or 2x250 paired end sequencing for this region.

The sequencing strategy used by the lab I am working with does not include the primers in the reads. From a previous post it seems like the 2x151 sequencing will be sufficient to merge reads if the primers aren't included in sequencing but I wanted to double check since this work is quite new to me.

Thank you for your help,

Mike

sixvable · November 9, 2020, 8:42am

Hi @Mike_McFarlin

Always choose PE250 mode for amplicon sequencing.

Sincerely,
sixvable

Nicholas_Bokulich · November 9, 2020, 9:09am

Hi @Mike_McFarlin,
Welcome to the forum community!

As @sixvable pointed out, 2x250 would be best. 2x151 barely gives enough overlap to cover the 515f-806r amplicon reliably... especially if there is any loss of quality at the 3' ends of the reads. This is a very common issue, and commonly dooms 2x151 V4 runs to be utilized as single-end data.

So 2x250 is a little more expensive, but less risky than 2x151.

Good luck!

SoilRotifer · November 9, 2020, 4:10pm

Hi all, the decision of whether or not to use 2x150 or 2x250 depends on your sequencing protocol.

For more details on this see the following post:

It looks like the EMP web site is not currently accessible, so checkout the protocols.io version.

Mike_McFarlin · November 9, 2020, 11:51pm

Hi everyone,

Thank you so much for the responses.

@SoilRotifer, since the sequencing that I will be using does not sequence through the primers, as stated in your previous post you shared, it seems like it would be safe to use 2x150. Am I reading that correctly?

Thank you,

Mike

SoilRotifer · November 10, 2020, 2:37pm

Hi @Mike_McFarlin,

Yep! Many of the data sets I've worked with are from Argonne National Lab, which consists of output generated from their 2x150 V4 EMP protocol. I've not had trouble merging the paired end output from these so far.

In fact, here is a data set from a small pilot project my colleagues and I performed using this approach.

-Cheers!
-Mike

Mike_McFarlin · November 13, 2020, 8:03pm

Hi @SoilRotifer,

Thanks, this is a huge help! I'll be working with Argonne National Lab for the library prep and sequencing so it's nice to hear you've had no issues merging paired end from 2x150.

-Mike

Peter_Kos · June 26, 2023, 2:11pm

The V4 expected amplicon size is 270 bp–387 bp, if you just subtract the two positions, you get 292 nt length, which may vary strain by strain, so there will be some differences. Depending on the requested length of overlap, you may be lucky to have a few pairs that overlap. The "few" may mean a composition-dependent unknown fraction anywhere between 0% and 100%. If you subtract the 2x151 from the above amplicon sizes, you will have a distribution of gaps between 0 and 32 nt or overlaps between 0 and 85 nt. That is in the ideal case when all the last bases survive the quality trimming (if any) before joining.
It means that if you use 2x151nt sequencing and you close your eyes and join the reads, than you will be given a set of joined reads ("no trouble"), a syntactically perfect artifact with an extreme bias toward the short-V4 genera and lose the long-V4 part of the world.
So you'd better carefully check the output of the statistics of the reads before and after the joining (merging) with respect to length distribution and the number of them.
I know it is too late to answer the original question, but ppl may have the same question later, so I just put it here.