We are preparing our data for dada2 and are unsure about whether or not to truncate.
I realize that there are already a few topics about truncating in DADA2, but I have not yet found a situation comparable to ours. Hence, I am posting our question here.
A few details about the data: we performed paired-end sequencing of the V4 region (primers 515F and 806R). The merged sequence length should be about 248bp with an overlap of 208bp between forward and reverse reads.
The quality plots show a drop in quality from sequence base 160 onwards (both in the forward and reverse reads). For all bases the median quality score is at least 25, but we find that there are still a lot of reads with low quality scores (<25) at the higher positions.
Now we are doubting if we should truncate at base 160 (remaining overlap ~120bp) of if we should not truncate at all. We lean towards truncating the data, as we have a fairly high number of reads, so we would probably still maintain sufficient data. However, as the median quality score is 25 (not great, but not terrible) we are worried about losing a lot of valuable data by truncating.
We would really appreciate your input on this. See below the quality score plots and the full demux-summarize.qzv file in case you need more details .
trace_demux.qzv (321.6 KB)