Determining dada2 parameters

Mehrbod_Estaki · March 9, 2020, 7:22pm

Hi @ycastro715,
Welcome to the forum!
Thanks for reading previous discussions before posting and providing details about your case, very helpful.

So, the rough calculations for this are also discussed in other threads which you may or may not have been seen, but they go something like this.
Your expected V3-V4 amplicon size: 806-341= ~465
Your total sequenced length: 2 x 250 = 500
Dada2 minimum overlap required = 12 nt (this has been changed from 20 nt that was in previous versions)
So, your total overlap is ~ 500-465=35. You need a minimum of 12nt for proper merging, so that means 35-12=23 bp can be cut (combined) from your Forward and Reverse without compromising merging. BUT consider that there are natural variations in amplicon length at V3V4, so some taxa are naturally shorter and more important some are longer in that region. This means that if you trim all those available 23 bp then you may actually systematically remove naturally longer taxa because they will fail to merge. This can introduce bias to your downstream analysis. In my experience natural variation in V3V4 in human and mouse fecal samples can range at least 20+- from the expected 465. That means that you basically can't afford to truncate any bp from your 3', lest you potentially introduce some bias. This is the downside of sequencing V3V4, in that unless you do a 2x300 run, you can't really afford to truncate any reads. You can try running DADA2-paired-end here with minimum/no truncating and see how the output is. If its too low, as I expect it to be, I would suggest only running the forward reads and discarding the reverse.

Yes, no need for trimming from the 5' side. However, your reverse reads do have an initial dip in quality within the first 20 bps which may cause you to lose some reads in the initial filtering step. Trimming ~20 of that might be useful and because this is on the 5', it won't affect merging.
Hope this helps!