QIIME2/DADA2 vs singletons

Hi,

I am trying to understand why my samples have singletons after following the “moving pictures” tutorial. Note that number of singletons is very small 0.14%. Still, I want to understand why there are there and are not being removed by default. Based on Callahan et al. (DADA2: High-resolution sample inference from Illumina amplicon data), I understood that DADA 2 (implemented in QIIME2) uses an algorithm allowing to decide which sequences should be kept and which should be removed. A singleton will be kept if it is present in the list of prior sequences. Is this correct?

Thank you in advance for your clarification.

Best, Joanna

Hello Joanna,

I think so. DADA2 goes to great lengths to try and preserve and resolve singletons in a data set. Check out this discussion of sample pooling and pseudo-pooling and search for 'single' to discussion:
https://benjjneb.github.io/dada2/pseudo.html#pseudo-pooling

However, the tradeoff for that high specificity is that sensitivity, in particular to very rare variants, is somewhat reduced by the relatively conservative default OMEGA_A threshold, and singletons are not detected at all. This is often the right tradeoff ...
The purpose of priors is to increase sensitivity to a restricted set of sequences, including singleton detection, without increasing false-positives from the unrestricted set of all possible amplicon sequences that must be considered by the naive algorithm.

If you have specific questions about your singletons, we could investigate this more and look into how DADA2 works under the hood in the q2-dada2 plugin.

Thank you, Colin, for your reply and the link. The answer helps and puts me in the right direction.

To offer one more possibility on top of @colinbrislawn great answer, in the event that you have paired-end sequences, you can end up with singletons after the merging step which comes after the denoising (where you singletons would be discarded). Now, you did mention following the "Moving Picture" tutorial, which is based on single-end reads so this wouldn't be the issue but just wanted to add this here just in case.

2 Likes

Good remark, indeed that could be the issue as I used "qiime dada2 denoise-paired" step for paired-end sequences.

1 Like