Thank you for sharing this awesome post! While I don’t have the exact answers to your questions I’ll discuss a couple of points which may be useful in understanding this.
First, the classic OTU clustering method is far from ideal and is very prone to spurious calls and in my (and I’m sure many others) should just not be used unless under very specific conditions. There is enough readings (ex1, ex2, ex3) on this to steal a solid weekend away from you. So in moving forward I won’t spend too much time discussing the OTU method, only that what you see is typical of this method and there are many potential reasons behind this, but the important takeaway is that it is clearly not as accurate as denoising methods so why even bother to go there.
The second point I wanted to make is that using species richness as a measure of accuracy may not be the best way to compare these pipelines. At least not without some error estimation. There are some really good work by Dr. Amy Willis’ group on dealing with this issue, described in a couple of pre-prints here and here and in fact they have developed both R packages and Qiime2 plugins to estimate alpha diversity with error estimates. Look for their Breakaway and DivNet packages for more info. These methods rely on the presence of singletons and rare taxa to model and predict unobserved species which gives you a more accurate estimation (and error bars!) of the true richness. As far as I know no one has benchmarked these different denoising pipelines using estimated richnesses, so that might be a natural next step here! Speaking of benchmarks, similar to what you have found, there are 2 other studies (Nearing et al. and Caruso et al.) that I know that have compared these pipelines and discuss their differences in more detail. Certainly worth a read for the intrigued.
Those being out of the way, let’s get specific:
Aside from the fact that OTU clustering is known for spurious calls, the current DADA2 default (Pool=F) makes an active effort to discard singletons as they are too hard to call correctly (discussion on this here). In fact when runnning single end reads there will be no singletons in your table, you can see how compared to other methods that allow singletons you will see a drastically reduced number of ASVs. The DADA2-paired method however can have singletons since they can again appear after merging but I would still think the number of ASVs would be lower with DADA2 default due to its conservative nature of dealing with singletons. In this regard I can’t speak to uNOISE3 and its zOTUs, as I’m just not familiar enough with its methods.
The pooled option in DADA2 is important here as this allows the inclusion of singletons at the cost of significantly higher computation/running time. Whereas with pool=F, DADA2 analyses the data on a sample-to-sample basis and discards singletons with each sample, with the pooled option, data is shared between samples which means singletons are compared across the whole dataset and not per each sample. This leads to retaining higher ASVs and is the preferred method when trying to estimate alpha diversity with the breakaway package.
That is good to hear! Otherwise we might have to reanalyze 2 decades worth of sequencing data and this is in line with what I’ve found on my work and what others report. Overall it seems to be that in the grand scheme of things the order in which we can rely on micrbiome data results is: overall patterns > relative abundances > alpha diversities.
Thanks again for sharing this and I look forward to others’ take on this.