Losing reads during merging in Dada2

Dear experts,

I have just started working with data from a new sequencing facility and I am losing many more reads during the merging of the pairs in dada2 than I am used to (based on previous experience sampling the same type of community with a different sequencing facility using 300 PE)

I have 250 PE reads covering the V3-V4 region with the following primers 341F CCTAYGGGRBGCASCAG 806R GGACTACNNGGGTATCTAAT. The barcodes are still included in the 250 bases for each read so in practice I have 244 bases in each direction. By my estimate this doesn't leave many bases for a good overlap when merging the pairs, expected size =465 and largest possible number of bases =488, so 23 base overlap.
Fortunately the quality of the reads was really good and I didn't have to trim the ends.paired-end-demux.qzv (309.8 KB)
I am still concerned this doesn't allow enough 'wiggle room' in the overlap for longer than average amplicons and I may have taxonomically biased merged reads.

I trimmed the reads with cutadapt before running Dada2 with the trunc-len set to the remaining fragment size (no truncation)

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed-end-demux.qza
--o-table table
--o-representative-sequences rep-set
--o-denoising-stats denoising-stats
--p-trunc-len-f 227
--p-trunc-len-r 224 \

This resulted in only an average of 57% of the reads being left after merging.
stats-dada2.qzv (1.2 MB)
The resulting ASV table also has 2245 singletons.

Is there anything I can do to improve the merging?
Are my results going to be biased because the overlap is not long enough?
Would you expect this many singletons after Dada2?

Thanks in advance for any advice

Hello Sam,

To the best of my knowledge, the dada2 plugin for Qiime2 merges paired-end reads using the defaults of dada2 (link to code), which might cause minor issues because by default in dada2, maxMismatch = 0 for this function.

This means that any differences in the region of overlap will cause your reads not to join, and could explain these lower than expected merging results.

The default minOverlap is set to 12, so 23 basepairs of overlap should work fine!

While the Qiime2 plugin goes not support changing maxMismatch yet, you could try changing this setting while running DADA2 in R and see if that improves your merging results.

Did you sequence any positive controls containing known taxa? I ask because your read depth is quite good, even after losing many reads, so if you can shown the taxonomy is not biased, you could be good to go! :bar_chart: :face_with_monocle:

Dear Colin
Thank you for your fast response. Unfortunately we didn't include a positive control in this run but you are correct, I still have enough sequences left even after losing many. I will try your suggestions of increasing the maxMismatch and I plan to remove the singletons from the feature table.