I have just started working with data from a new sequencing facility and I am losing many more reads during the merging of the pairs in dada2 than I am used to (based on previous experience sampling the same type of community with a different sequencing facility using 300 PE)
I have 250 PE reads covering the V3-V4 region with the following primers 341F CCTAYGGGRBGCASCAG 806R GGACTACNNGGGTATCTAAT. The barcodes are still included in the 250 bases for each read so in practice I have 244 bases in each direction. By my estimate this doesn't leave many bases for a good overlap when merging the pairs, expected size =465 and largest possible number of bases =488, so 23 base overlap.
Fortunately the quality of the reads was really good and I didn't have to trim the ends.paired-end-demux.qzv (309.8 KB)
I am still concerned this doesn't allow enough 'wiggle room' in the overlap for longer than average amplicons and I may have taxonomically biased merged reads.
I trimmed the reads with cutadapt before running Dada2 with the trunc-len set to the remaining fragment size (no truncation)
qiime dada2 denoise-paired
--p-trunc-len-r 224 \
This resulted in only an average of 57% of the reads being left after merging.
stats-dada2.qzv (1.2 MB)
The resulting ASV table also has 2245 singletons.
Is there anything I can do to improve the merging?
Are my results going to be biased because the overlap is not long enough?
Would you expect this many singletons after Dada2?
Thanks in advance for any advice