DADA2, merging and overlap

stephhhhanniee · January 25, 2024, 8:22pm

Hi everyone!

I'm currently processing paired reads for 16S of V3-V4, forward (341F) and reverse (806R), with DADA2 and I was wondering if someone could help explain or confirm what I'm seeing in my results?

My forward reads are 225 nt and reverse are 222 nt (barcodes and primers are removed from F and R). The sequencing facility says on their website the fragment length is 470 bp for V3-V4.

To calculate overlap, I did (225 + 222) - (470) = -23 which is supposed to mean there's a 23 bp gap between my paired reads. If there's a 23 bp gap from the calculations, then how am I getting such a high % of merging (66-88%) in my denoising stats? Is my logic correct?

My code is:
qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-seqs.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-chimera-method consensus
--o-representative-sequences rep-seqs-denoise.qza
--o-table rep_seq_feature_table.qza
--o-denoising-stats denoising-stats.qza
--verbose

And here are the files:
reads_trimmed_summary.qzv (307.4 KB)

denoising-stats.qzv (1.2 MB)

I'm happy I got good merging, but not sure how if I'm going off the calculations. Any help is greatly appreciated! Thanks so much.

cherman2 · January 25, 2024, 10:53pm

Hi @stephhhhanniee,

The math I typically do for this is 806-341 =465 bp length, but you have to also consider the overlap between merging sequences, which is 12 bp. so you are looking at about an average of 477 bp for your sequences. 225 +222 = 447, which isn't "technically" long enough to merge.

However, you are getting high merging which is awesome! The reason you are getting successful merging is: 16S regions have some variation in length. It looks like 66%-88% of your reads had a shorter 16s region then expected! Unless there is anything else concerning, I think you are good to go!

Hope this help!

stephhhhanniee · January 26, 2024, 6:04pm

Hi @cherman2,

Thank you so so much for your help and clarification! That makes sense to me. I knew V3-V4 was variable, but didn't think that much! I went through the entire protocol and got good results so I think it's exactly what you said about 66-88% of reads had a shorter 16S region than expected.

Thanks again!

system · February 27, 2024, 12:04am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.