Do you guys still remove singletons or doubletons these days

Hi @sdpapet,
Unfortunately there is no right answer to your question but perhaps a right for your case can be figured out.
For plugins like Deblur and DADA2, they actually make an active effort to not include singletons. With paired-end data from DADA2 I find that I occasionally still get singletons since merging occurs at the end and may introduce some. But if you do see a lot of singletons after those results you may consider looking deeper into your data in case you accidentally left in your barcodes or primers or something else.
For things like alpha and beta diversity analyses, singletons can be quite influential and should be kept in as long as you can be confident that they are true features. In fact in plugins like breakaway it requires you to keep those singletons/rare taxa in, so it does use those for its model.
In other instances such as when you are doing differential abundance testing with ANCOM or gneiss, singletons are never a good idea since they offer no meaningful information to those models and instead add noise. So for those it is often recommended to remove not only singletons but rare taxa in general such as you described. Depending on the dataset and community source you should get rid of of those rare taxa.
This MY approach and as far as I know there are no benchmarks of this. So please take it with a grain of salt and others can chime in their approach too.
If I have a very diverse community and I have good coverage of reads then I lean towards increasing min frequency threshold to 50-100. In less diverse samples like mouse gut, I lower that to 10-20. In addition I also filter our features that don’t occur in at least 25-50% of my samples. Of course this means that your differential abundance analysis is now not very sensitive to low abundance/ very rare taxa so if that is what you are actually interested in these methods might not be the best choice.
Ultimately, you might have to choose different settings for different questions and analyses of your data.
Hope that helps a little and didn’t add more uncertainty :stuck_out_tongue:

6 Likes