Understanding the fate of barcoded primers in paired-end sequencing

Hi all,

I've been studying where barcoded primers end up in relation to read 1 and 2 after pair-end sequencing and came upon this video from Illumina: https://www.youtube.com/watch?v=fCd6B5HRaZ8.

Assuming I use the same barcode for both of my primers. My understanding is that a template strand (T) from either side of my amplicon, which starts with the barcode followed by either the forward primer or reverse primer sequence at its 5' end, is attached to the flow cell and a complementary strand (C) is created. The template (T) is removed. The new complementary strand (C) is reverse complemented into a new strand (C2) attached to another oligo on the flow cell. After this, the originally created complementary strand (C) is removed, leaving the freshly created strand (C2) to be used for sequencing first, outputting R1 (forward read), a sequence that starts with whichever primer wasn't at the 5' end of the template (T). This R1 sequence will thus not start with my barcode, but have the barcode and the template primer in reverse complement form at its end. Neither is likely to get sequenced, depending on configuration.

Bridge amplification is done again, and only then is the original complement strand (C) used for sequencing, giving us R2, the sequence of the original template (T) that starts with the barcode.
Thus, only R2 will have the barcode i.e. starting with the barcode then either the forward or reverse primer. R1 on the other hand won't start with a barcode and will start with whatever primer sequence isn't found on R2.

Does this mean --m-forward-barcodes won't demultiplex anything from my multiplexed sequences, and only --m-reverse-barcodes will work? How do I work around this given that --m-forward-barcodes is required and --m-reverse-barcodes is optional? How do I match barcodes in R2 and not in R1?

Please help me clear my mind on this, I have spent the past few days imagining these to aid my amplicon analysis. I'm starting to think there was a video error on Illumina's part and the blue-attached strand (C2) should've been washed away before R1 sequencing instead of the purple-attached strand (C) at 2:04. Or maybe the sequencing done on (C2) is outputted as R2 despite it being done first? Thanks!

Hello @RielAlfonso,

Let's see if I can help clear up some of the confusion.

My understanding is that a template strand (T) from either side of my amplicon, which starts with the barcode followed by either the forward primer or reverse primer sequence at its 5' end, is attached to the flow cell and a complementary strand (C) is created.

This is correct. I think it's helpful to only think about what occurs in a single cluster at time. Each of these two fragments (the sense fragment of your amplicon and the antisense fragment) make their way to their own cluster.

The template (T) is removed. The new complementary strand (C) is reverse complemented into a new strand (C2) attached to another oligo on the flow cell.

Yes, but here C2 is just T.

After this, the originally created complementary strand (C) is removed, leaving the freshly created strand (C2) to be used for sequencing first, outputting R1 (forward read), a sequence that starts with whichever primer wasn't at the 5' end of the template (T).

Yes, but there are many copies of T in the cluster now due to bridge amplification. The first read R1 is thus the reverse complement of the original template strand that formed the cluster. Thus it will begin with the reverse complement of the 3' primer.

This R1 sequence will thus not start with my barcode, but have the barcode and the template primer in reverse complement form at its end. Neither is likely to get sequenced, depending on configuration.

Unless intentionally placed downstream of the sequencing primers, the barcode(s) is generally not part of the read. Instead it is read during a separate sequencing process as explained in the video. I'm not sure what you mean by "neither is likely to get sequenced".

Let me know if this helped clear things up.

3 Likes

Thanks, that does help a lot. I'd like to clarify that by barcode here I'm referring to a per-sample barcode pre-attached to the primer before PCR, which by my understanding will be part of the output sequence. Since it will be on the 5' end of the T sequence, on shorter sequencing configurations it'll be in reverse complement form on the end 3' of R1 and won't be sequenced unless my amplicon size is small enough for complete overlap between R1 and R2 (it does not). This means that I'll only find this barcode on R2.

In this case (we have multiple sets of per-sample barcoded amplicons), this contrasts with a per-sample-set sequencing barcode that is attached during library preparation and doesn't get sequenced, right? Unfortunately I haven't yet done my own library preparation so I'm still grasping the concept here too.

Edit: I made the realization that after successive rounds of PCR with similarly barcoded forward and reverse primers, the barcode will eventually also end up on the 3' end as its reverse complement. So both R1 and R2 will in fact start with the barcode and end with its reverse complement. I am embarrassed but relieved that I can visualize this now.

Hello @RielAlfonso,

It sounds like you have everything figured out, if not let us know.