Where to trim/truncate reads in DADA2

lizgehret · August 25, 2021, 10:20pm

After taking a look at your stats-dada2.qzv file, this doesn't seem unreasonable from the trim/trunc lengths we discussed above.

One point I do want to bring up with my suggestions above is that it was fairly difficult to make out the actual averages in your quality scores from the original visualization you provided in your post. Something to keep in mind when trimming/truncating based on quality scores is that you want to be looking at the average score (i.e. the middle value for each box & whisker plot you'll see when zooming in on your quality plot) rather than the edges of each box, or the whiskers. You could have a small number of outliers for the quality score at any given nt location, which can skew the overall visualization - so making sure you are looking at the average quality score at each nt, and looking for a general decrease in the average can help you to determine where to trim/truncate.

All of that to say, you should ultimately make an informed decision as to where you want to trim/truncate your reads, since you know your data and the analysis you'll be doing on it much better than I do!

That's a great question! This forum response goes into detail on why merging typically fails for V3-V4 2x250 runs. Essentially, you are attempting to utilize a large amplicon length (the V3-V4 region) with 250bp for your forward and reverse reads. This leaves a small margin for error with respect to your overlap region in order to successfully merge your forward and reverse reads. I'll provide a short quote from @Mehrbod_Estaki from his forum response above that dives into the numbers a bit more:

With the most common V3-V4 primers you will have a ~460bp amplicon, but with a 2x250 bp run you will have a maximum of 500bp reads which means there is only 40bp of overlap. DADA2 requires a minimum 12bp overlap for proper merging, otherwise it will toss any reads (both forward and reverse) that it can’t merge. Take into consideration the natural variation of this amplicon length meaning some true taxa would need more than 12bp overlap, and the fact that we need to truncate the poor quality tails of our reads on the 3’ (where merging occurs).

Again, based on the information you provided above, it seems like utilizing your forward reads with denoise-single will most likely be your best bet in this situation.

Cheers,
Liz