Lost of data with dada2

benjjneb · October 11, 2017, 8:55pm

That is strange!

On other datasets, the number of reads passing each step is a very useful way to diagnose the issue. My basic troubleshooting workflow would go:

Problem -- low fraction of reads making it through pipeline.

Is it really a problem? (see above)
What step are most being lost in?
If filtering: Try truncating sooner, especially before quality crashes, or raise maxEE.
If merging: Make sure your reads still overlap (by 20nt + amplicon-length-variation) after truncation.
If chimera removal: You probably need to remove primers from the reads.
If still not working -- is this just really bad quality data? Especially the reverse reads? Try just forward reads?

That's not going to cover every possible way things can go wrong (you don't usually expect a mix of V3V4, V4 and V4V5 amplicons!) but in my experience that will usually get to the bottom of "lots of reads being lost".