Thanks for the comprehensive response, @Tahseen_Abbas!
I’m still relatively new to 16s sequence analysis, and am not a wet-lab scientist. I’ll do my best to help, but more-experienced community members may have better insight for you.
I don’t have a paper for you on the percent of features to expect removed when singletons are trimmed post-DADA2, but Auer 2017 treats singleton removal from an OTU study in depth, and the large-scale reductions of unique sequences in different study contexts are worth looking at.
Re: post-DADA2, I have only one data point to share personally - working with fecal samples in a controlled-environment mouse study, features seen in one sample seemed likely to re-appear in other samples from the same mouse/cagemates. After filtering out all features that appeared in only one sample, we saw a drop from 8836 features (ASVs) to 3841 features.
qiime feature-table filter-features \
--i-table paired-data/DADA2/cleaned-table5.qza \
--p-min-samples 2 \
This is more attrition than you saw in your work, which is unsurprising if we assume you filtered only true singletons. Some of our single-sample features were present 3, 4, 5+ times in that one sample, and would not have been cut by a true singleton filter.
Auer (esp. section 4.4) and Bokulich 2013 both discuss “rare filter” strategies. The only approach I’ve used personally has been removal of samples with unusually low sequencing depth (e.g. min 1000 reads for the high-quality samples from the fecal study I discussed above). I think about this not in terms of removing rare reads as in Bokulich, but in terms of removing low-quality/failed samples.
That’s officially everything I know about sequence filtering! I hope there’s something of value in it for your work.