We just ran Dada2 with paired-end reads for a soil microbiome project. For each of 3 different timepoints (each run separately through Dada2) we are getting 50-70+ singletons in our feature table. By singleton I mean features/ASVs that have only been found once, and in one sample (small sample in screenshot below).
At first we were concerned bc/Dada2 is supposed to remove singletons, and we always thought that singletons are most likely errors (they shouldn’t naturally be present, except in error), but previous forum posts say that some singletons may be expected when running paired-end data because Dada2 only removes singleton reads before merging, and the merging process can actually create some singletons. BUT, those forum posts still said that singletons are rare (<15 or so?) for paired-end data, and made it seem like an error still. We’re trying to figure out whether these singletons are valuable or not, and what we should do with them (filter them out, keep them in).
Already for trouble-shooting, we looked at our raw sequencing data and confirmed that the barcodes were still in our samples (multiplexed), but not primers (so we are confident we’re using the right importing and dumux commands). It’s interesting that all three timepoints have the same trend—does that indicate it’s less likely to be error? Or are all singletons some sort of “noise” or “error” by definition? Before we continue with analysis, we wanted to get some input as to whether this is “normal” or not and whether we should filter this many out of our table.
Brainstorming here…would truncating/trimming our data more aggressively, to only use the highest quality bases, help reduce the number of singletons we are getting? If it was a matter of noise that might help, right?
We have plenty of overlapping room to do so, so I suppose it couldn’t hurt to try!
We’d really appreciate any advice on how to proceed from here.