Lost of data with dada2

That is strange!

On other datasets, the number of reads passing each step is a very useful way to diagnose the issue. My basic troubleshooting workflow would go:

Problem -- low fraction of reads making it through pipeline.

  1. Is it really a problem? (see above)
  2. What step are most being lost in?
    If filtering: Try truncating sooner, especially before quality crashes, or raise maxEE.
    If merging: Make sure your reads still overlap (by 20nt + amplicon-length-variation) after truncation.
    If chimera removal: You probably need to remove primers from the reads.
  3. If still not working -- is this just really bad quality data? Especially the reverse reads? Try just forward reads?

That's not going to cover every possible way things can go wrong (you don't usually expect a mix of V3V4, V4 and V4V5 amplicons!) but in my experience that will usually get to the bottom of "lots of reads being lost".

3 Likes