Singletons still present after dada2

Dear all,

I have read in several post in this forum and also in dada2 issues that dada2 is supposed to filter out singletons. For singleton I mean sequence variants represented by only a single read. After running dada2 on my data, I ran the following commands that I had found in the forum to remove singletons:
qiime feature-table filter-features --i-table table.qza --p-min-frequency 2 --o-filtered-table table_nosingletons.qza
qiime feature-table summarize --i-table table_nosingletons --o-visualization table_nosingletons.qzv
I realized that doing this I removed 15 sequences variants which were identified as singletons. I am surprised since dada2 should have already removed those singletons. Actually it is already possible to see that there are some singletons in the feature table obtained by dada2 (which I am attaching).table-dada2.qzv (664.6 KB)
Have I understood wrong or have I done something wrong?
Thank you for any clarification on this,
Cheers
Niccolò

1 Like

When you are processing paired-end data, singletons can arise in the merged output. So you’ve done nothing wrong. However, such singletons should be rare (and 15 total qualifies as rare in any normal sized study). In general I’d recommend just filtering them out as you did.

Singleton sequences can arise even though no singletons are called in the forward or reverse reads individually. When those F/R reads are merged, sometimes a read-pair can be corrected to a unique pair of F/R denoised sequences that are mergable. This is more likely in longer (less-overlapping) amplicons, but should be rare regardless. In my not-extensive testing, these merged singletons are as often an artefact as they are a real biological variant, hence the recommendation to filter them out if they exist in your data.

4 Likes

Dear @benjjneb, thanks a lot for your clarification. This issue is much more clear to me now.
Best
Niccolò

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.