I am trying to understand why my samples have singletons after following the “moving pictures” tutorial. Note that number of singletons is very small 0.14%. Still, I want to understand why there are there and are not being removed by default. Based on Callahan et al. (DADA2: High-resolution sample inference from Illumina amplicon data), I understood that DADA 2 (implemented in QIIME2) uses an algorithm allowing to decide which sequences should be kept and which should be removed. A singleton will be kept if it is present in the list of prior sequences. Is this correct?
I think so. DADA2 goes to great lengths to try and preserve and resolve singletons in a data set. Check out this discussion of sample pooling and pseudo-pooling and search for 'single' to discussion: https://benjjneb.github.io/dada2/pseudo.html#pseudo-pooling
However, the tradeoff for that high specificity is that sensitivity, in particular to very rare variants, is somewhat reduced by the relatively conservative default OMEGA_A threshold, and singletons are not detected at all. This is often the right tradeoff ...
The purpose of priors is to increase sensitivity to a restricted set of sequences, including singleton detection, without increasing false-positives from the unrestricted set of all possible amplicon sequences that must be considered by the naive algorithm.
If you have specific questions about your singletons, we could investigate this more and look into how DADA2 works under the hood in the q2-dada2 plugin.
To offer one more possibility on top of @colinbrislawn great answer, in the event that you have paired-end sequences, you can end up with singletons after the merging step which comes after the denoising (where you singletons would be discarded). Now, you did mention following the "Moving Picture" tutorial, which is based on single-end reads so this wouldn't be the issue but just wanted to add this here just in case.