ASV vs OTU for fungal ITS

rewritten on 7/6/2024


The problems with OTUs have been the theme of the last decade (mid 2010s to mid 2020s), as programmers reinvented the species concept and learned why it sucks.

In 'ITS alchemy,' Kauserud 2023 doi.org/10.1016/j.funeco.2023.101274 outlines these issues in the context of ITS amplicon sequencing. Kauserud clearly understands the limitations of amplicons but does not grasp that these are inherent to the failure of the species concept.

While Kauserud is happy to dabble in the alchemy of taxonomy, I prefer to avoid it.

Let's start at the beginning:

Living things have DNA that we can sequence
:microbe: -> :dna: -> ATCGG

We count up these sequence variants into a feature table. 'Features' means OTUs or ASVs or whatever

Part 1: ASVs are just OTUs

However, ASVs are conceptually nothing else than OTUs;

Right, it's just another Operational Taxonomic Unit!

:microbe: -> :dna: -> ATCGG -> ASVs/OTUs

The UNOISE author Robert Edgar proposes calling sequence variants 'zOTUs' for zero-radius OTUs, meaning 100% similar OTUs. This differentiates these new things from the older OTUs commonly clustered at 97%.

Why did we cluster OTUs at 97%? Robert also explains that we were trying to capture the species concept. Thanks @.Robert_Edgar !

Part 2: Everyone knows what a Species is, right??

Kauserud 2023 describes why amplicons struggle to capture the species concept.

However, even when having 100% identity and coverage with a certain reference sequence, species hypothesis or taxon, we cannot ultimately be fully confident that our environmental sequence represents this taxon.

Since numerous ITS alleles often occur within fungal species, it is clearly not suitable to use so-called amplicon sequence variants (ASVs) as units in ITS based fungal community studies.

Other problems are not specific to amplicons or genomics at all:

Obviously, there is no general sequence-clustering threshold across species and there will always be a trade-off between over-splitting and lumping of species. (emphasis mine, wiki article about this )

But notice that 'species' is never defined. Why is that?

Wikipedia lists around seven to eleven definitions of species. Is it one of these?

There's a second page just for microbial species. Are fungi microbes?

What about Cryptic- and Super-species? Now I'm spooked! :vampire: :ghost:

Part 3: A Sequence Variant is just a bunch of letters

The broad adoption of Sequence Variants is misrepresented as an 'upgrade' to OTUs, misunderstanding that ASVs are an intentional downgrade meant to avoid the failure of the species concept.

:microbe: -> :dna: -> ATCGG -> ASVs -> sp. Colin 1209
please stop here ^

In general, the correct sequences are expected to be more abundant in the final sequence pool compared to the artificial sequences, since the original correct sequences attend more PCR cycles. Further, the parental sequences are expected to be very similar to the mutated sequence(s), most of them differing by only one bp. These programs are able to prune away the most likely PCR mutations (Callahan et al., 2019; Estensmo et al., 2021). When working perfectly, the user will end up with a set of sequences perfectly matching the original haplotypes or alleles in the PCR mix.

Notice how species and taxonomy are never mentioned! This is great!

DADA2, UNOISE, and deblur report sequences
The sequences are evidence of genes :dna:
These genes come from microbes :microbe:
Then it gets really messy :fire: :fire_extinguisher: :firefighter:

For example, due to intraspecific variability in the ITS region, and sometimes intragenomic variability, ITS sequences must be clustered to approach species level resolution in community studies. (emphasis added)

Sure, you can cluster these Sequence Varients into OTUs, but which of those 7 definitions are you approaching exactly?

After the continuous failure of the species concept in microbial ecology, I appreciate practical programs that produce simple sequences.


Sorry for the long reply. Back to your question:

Yes, I agree! And, crucially, sequence variants are just sequences of letters, so you can match them up to taxonomy or cluster them any which way you like! I usually do this:

You don't need alchemy or the species concept.
:no_entry_sign: :alembic: :mage:

8 Likes