If two 16S copies in a species are sufficiently different that DADA2 determines that they’re two ASV, then without looking too closely, one might assume that there are two species. Whereas if one assumes an ASV simply represents a 16S copy, without any assumptions about species, one would seldom be wrong. So that should be the ‘correct’ way to think about ASVs.
However, one might have a problem given the following scenario: Sample A has 10 species, each with 2 very distinct 16S copies. Sample B has 10 species each with two 16S copies that are identical within species. ASVs are not a direct measure of species number but may still be used as an indirect proxy for species number. If so, one might assume Sample A has more species simply because it has more ASVs.
Could one make any general assumptions about species numbers from ASV number alone?
This is an excellent point and one that probably has no good answer, but well deserving of a good discussion as it is a very obvious shortcoming of short amplicon sequencing (maybe).
But first, just a clarification for anyone thinking about DADA2’s threshold for calling different ASVs…
Sufficiently distinct according to DADA2 is a minimum of 1 nt. That’s enough to call a different ASV.
I would agree that to assume an ASV represents a single species is not accurate, and for the most part I think everyone is on board with this too. For reasons like the great examples you provided and some others that I’ve previous mentioned. There are certainly some tools out there that try to correct for copy numbers but these are not always recommend, see here for a great review of the topic.
The problem (one of many) is, even if we have the copy numbers of a species, how do we figure out the correct ratios when some of these variants can be shared across species.
In my opinion, there is simply too many unknown factors with regards to copy numbers in short-read sequencing to allow us to infer real composition of the community.
Where I think we can find some peace though is that the scenarios which you described are rarely found in the wild and that the overall patterns of our comparisons seems to stay true, no matter how we account for copy numbers. In other words, artificial inflation of diversity due to these distinct copy numbers tends to cancel themselves out across samples/groups. If one species is problematic, its problematic across all groups. So if we can just acknowledge that we simply cannot infer the absolute composition of a community and instead focus on overall patterns across communities we -hopefully- can just walk around the elephant in the room.
The ASV vs. OTU is a non-debate, since they’re measuring two different things. ASV’s (loosely) represents 16S copies, whereas 3% OTUs (loosely) represents species. The two are not really comparable.
I’ve read some arguments that say the ASV approach is ‘better’ than the OTU approach, but I find that to be misleading. Better at correcting for PCR errors & discriminating unique 16S copies? Certainly. Better at clustering sequences derived from the same species? Perhaps not.
It depends on the research question, obviously, but I don’t think using the ASV approach should be a given, especially if you have a species-centric view of microbial ecology.
Agreed on all accounts. It really depends on what you are trying to do.
While I think most agree by now that denoising methods are superior to q-score filtering/trimming what you do after for example cluster or not cluster is as you mentioned specific to the research question.
Not clustering approaches do have one particular advantage that makes them more desirable than OTU clustering in that they are comparable across studies (as long as they are of the same region) and don’t require re-clustering of your whole data every time samples are added.
I collected some readings and opinion pieces regarding this debate on another post here that might complement this discussion.
In an ideal world where funding and resources are unlimited, we’d abandon ASVs and OTUs altogether and just do shotgun metagenomics assembly!