Low percentage of merged reads after denoising/merging with dada2.

ChrisKeefe · September 9, 2020, 9:40pm

Not directly, @rsak-384. Take a look at your denoising stats. You'll notice that most of your attrition happens during the merging step. You have trimmed too many nucleotides, and so reads are not long enough to overlap and join.

Imagine you're trying to sequence V3 and 4 from 337F to 805R - you'd need 468nt to cover that region, plus or minus a few nt in natural variation. In addition, dada2 needs at least 12 nt of overlap so it can join reads properly. That's ~480nt out of your available 500, so you don't have much room to trim away low-quality data. Your actual numbers may differ, but that's the basic idea.

This is where quality comes in - if you increase your effective read length by loosening trim/trunc parameters to allow read joining, you will lose more sequences to quality filtering. Can't hurt trying it, but don't expect a big improvement.

Using only single-end sequences here saves you the trouble of balancing quality against length. I've never used this hack, and don't know if it will raise any issues for you during your downstream analysis, but it is possible to trick QIIME 2 into treating your reverse reads as if they were forward reads.

Good luck!
Chris