Percentage of reads discarded after running DADA2 on NextSeq data

Hello!
I have recently switched from analyzing MiSeq data, which I’m more familiar with, to NextSeq data, and I’m curious about the percentage of reads discarded after running NextSeq data through the DADA2 pipeline.

For years, I’ve been using this guide from @Nicholas_Bokulich posted in 2019 to interpret how well DADA2 performs on my datasets, and it’s been quite useful: conceptual justification of dada2 truncation prior to merging - #16 by Nicholas_Bokulich

I’ve now analyzed 3 different NextSeq datasets, one of which targeted 3 variable regions (V3-V4, V4-V5, V6-V8), and even after adjusting multiple parameters, I can’t seem to get more than ~60% of reads retained/less than ~40% of reads discarded, on average. Some of the MiSeq datasets I’ve worked with in the past performed a bit better than this (though it depends, of course).

I’m curious if others are seeing this with their NextSeq datasets? Is NextSeq data just noisier than MiSeq data, in general? To be clear, I don’t see this as an inherent issue with my data, unless of course everyone else is getting much better output in terms of how many reads are discarded :slight_smile:

Thanks!

3 Likes

Hello!

Most of the datasets I am working with are from Nextseq, with these strange-looking quality plots. With V3-V4, V1-V2 and V4 regions (2x250) reads usually I can achieve 75-92% of reads passing all the filters, merging and chimeras removal steps.

Remove the primers before Dada2, check at which step you are losing most of the reads, and try to play with ee parameter and minimum overlapping region to check if you can increase the output.

1 Like