Can we use extracted reference reads to calculate expected amplicon sizes?

We're using the 27F(AGAGTTTGATCMTGGCTCAG) and 338R(GCWGCCWCCCGTAGGWGT) primer set, targetting the V1-2 region of the 16S rRNA. So, yes, we expect the average amplicon size to be somewhere around 310.

The majority of merged sequences are between 270 bp and 346 bp, with a long tail on the right reaching 488 bp at the end. This was also observed in the expected amplicon size distribution allowing no primer mismatch, so I think it's safe to not do any sequence length based filtering.

p.s.
The expected amplicon sizes calculated by allowing no mismatch, one non-3'-end mismatch or setting minimum combined primer match identity threshold (0.8) in QIIME2 are largely overlapped. So using the extracted-reads to calculate expected amplicon sizes is reliable.

Yanxian

2 Likes