Wouldn't it be better to think of an ASV as a distinct 16S copy, rather than a distinct species?

Mehrbod_Estaki · July 22, 2019, 10:06am

Hi @SteveMcL,
This is an excellent point and one that probably has no good answer, but well deserving of a good discussion as it is a very obvious shortcoming of short amplicon sequencing (maybe).

But first, just a clarification for anyone thinking about DADA2's threshold for calling different ASVs..

Sufficiently distinct according to DADA2 is a minimum of 1 nt. That's enough to call a different ASV.

I would agree that to assume an ASV represents a single species is not accurate, and for the most part I think everyone is on board with this too. For reasons like the great examples you provided and some others that I've previous mentioned. There are certainly some tools out there that try to correct for copy numbers but these are not always recommend, see here for a great review of the topic.
The problem (one of many) is, even if we have the copy numbers of a species, how do we figure out the correct ratios when some of these variants can be shared across species.
In my opinion, there is simply too many unknown factors with regards to copy numbers in short-read sequencing to allow us to infer real composition of the community.

Where I think we can find some peace though is that the scenarios which you described are rarely found in the wild and that the overall patterns of our comparisons seems to stay true, no matter how we account for copy numbers. In other words, artificial inflation of diversity due to these distinct copy numbers tends to cancel themselves out across samples/groups. If one species is problematic, its problematic across all groups. So if we can just acknowledge that we simply cannot infer the absolute composition of a community and instead focus on overall patterns across communities we -hopefully- can just walk around the elephant in the room.