Dear all!
Could you recommend me an article to cite that it is “legal” or appropriate to estimate alpha diversity metrics based on ASV tables instead of species or OTUs.
In my case, I repeated alpha diversity commands for clustered to 97% OTUs and the results were extremely similar to such with ASVs. So I would like to keep only ASV based metrics, but also I would like to cite a proper paper for it. I think I read about it somewhere but I can’t find this paper
Hi @timanix,
I’m not sure if such a paper exists, the same way you wouldn’t be able to find a paper that can say estimating diversity at 97% similarity OTUs is appropriate. The idea that 3% difference in short 16S regions can distinguish species is quite misleading. But 100% similarity in certain regions also doesn’t coincide with identical species. However, if you are looking for a paper to support the use of ASVs, I would recommend this piece by the developers of DADA2:
Exact Sequence Variants Should Replace Operational Taxonomic Units in Marker-Gene Data Analysis
Hi @timanix,
I would just use them based on ASVs and make it clear that you’ve done calculations based on ASVs. (Swap out “observed ASVs” for “observed X”) and call it a day.
Best,
Justine
I can’t think of any papers explicitly showing that alpha diversity estimates are more accurate with ASVs than OTUs, but here are two ideas:
- point to the original dada2 paper, which shows that OTU clustering yields inaccurate sequence variants, whereas dada2 yields far fewer (or no?) false positives. This is essentially indicating better alpha diversity estimates. The paper that @Mehrbod_Estaki linked to is a good perspective on why ASVs should replace OTUs in general, so would be a good reference to use to support your case.
- if you still need more evidence, there are various papers out there (including my own) showing that OTU clustering methods overestimate alpha diversity. In your analysis unique OTUs ~= unique ASVs, but this is still consistent: basically, by clustering sequences at 97% you are losing the species and sub-species resolution that dada2 captures (reducing observed diversity) but still failing to detect and remove some false positives (increasing observed diversity). This balances out so that observed OTUs ~= observed ASVs in your case… results vary due to myriad factors (various users on this forum report higher or lower alpha diversity with OTUs vs. ASVs, or similar ballpark estimates as you see).
Thank you, I think that I will cite this paper
Basically, it is what I did, but one of the reviewers pointed out that I need or explain better why I used ASVs instead of species, or redo it with OTUs.
It is exactly what my supervisor told me. I just thought, that maybe I can cite a paper to avoid writing too much of explanations. This two papers I was advised are what I was looking for, thank you!
Alpha diversity was a little bit lower with clustered tables, than with ASVs, but I mostly interested in comparison of alpha diversity indicies between niches and pattern was the same
Thanks to all of you for your answers! That's really helpfull
Hi @timanix, et al. To extend some of what @Nicholas_Bokulich highlighted, I’d refer you to the following:
- Blog posts from Noah Fierer
- Lumping versus splitting – is it time for microbial ecologists to abandon OTUs?
- Intragenomic heterogeneity and its implications for ESVs..
- This paper from Glassman & Martiny 2018
- This paper from Rob Edgar 2018 (he is the developer of usearch) .
I might be missing some, but these should help alleviate any concerns of OTUs vs (A/E)SVs.
-Mike
@SoilRotifer Thank you for useful articles and links!