Should we truncate or not?


We are preparing our data for dada2 and are unsure about whether or not to truncate.
I realize that there are already a few topics about truncating in DADA2, but I have not yet found a situation comparable to ours. Hence, I am posting our question here.

A few details about the data: we performed paired-end sequencing of the V4 region (primers 515F and 806R). The merged sequence length should be about 248bp with an overlap of 208bp between forward and reverse reads.

The quality plots show a drop in quality from sequence base 160 onwards (both in the forward and reverse reads). For all bases the median quality score is at least 25, but we find that there are still a lot of reads with low quality scores (<25) at the higher positions.

Now we are doubting if we should truncate at base 160 (remaining overlap ~120bp) of if we should not truncate at all. We lean towards truncating the data, as we have a fairly high number of reads, so we would probably still maintain sufficient data. However, as the median quality score is 25 (not great, but not terrible) we are worried about losing a lot of valuable data by truncating.

We would really appreciate your input on this. See below the quality score plots and the full demux-summarize.qzv file in case you need more details .



trace_demux.qzv (321.6 KB)

Welcome to the forum!

I do not think that you will loose any of the data by truncating your reads at position 160, since:

  1. Your targeted region is relatively small
  2. Your overlapping region is huge.
    By truncating forward and reverse reads you will decrease overlapping region, not the length of merged reads. Truncation may be beneficial since it will exclude parts of the reads with bad quality scores and by doing this will increase number of reads successfully passed through filters and merged.

I suggest you to try truncation at positions 150 or 160.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.