I’m working with some sub par reads and I’m struggling with justifying my trimming locations and quality score cut offs.
Data background: 16S data from a MiSeq run, the paired end reads should cover the V3-V4 region using 341F, 805R primers as recommended by Illumina. This supposedly gives a ~460 bp amplicon with 140bp overlap. I imported the demultiplexed fastq files into qiime in
PairedEndFastqManifestPhred33 format and created a demux summarize .qzv artifact, dropbox link here
My initial thoughts were to use a min20 quality score cutoff point and looking at the medians. This puts the trimming point of the Forward reads at ~268, and Reverse reads at ~221. If my math is correct, this leaves a 27bp overlap still and I believe DADA2 recommends a minimum 20bp overlap for merging.
So my question then is, would I be better off keeping the above parameters, albeit not very good anyways, or lower my quality score cut off to say 15 and retain longer sequences? I’m wondering which of the 2 factors is more important for denoising or if there is a sweet spot in balancing the two factors. Finally, at what point in low quality reads world would using just the Forward reads perform better than forcing low quality paired-end reads?
Thank you in advance and I’m really enjoying the dada2 integration in qiime2.