ASV vs OTU for fungal ITS

Saptarathi_Deb · July 5, 2024, 9:36am

Hello Everyone,

I am searching for some more clarity about the use of ASV or OTU for fungal ITS analysis. After going through many publications I am seeing comments against the use of ASV for fungal ITS where the main rationale is that 1) ITS is variable in length and not suitable and 2) Fungal ITS show intraspecific and intragenomic differences and thus ASV ITS sequences must be clustered to approach species level resolution.

While in my opinion the use of ASV seems like a good option considering the clarity it provides but i am not able to understand the logic behind the preference of OTUs over ASV.

Therefore, I am seeking for some opinions and suggestions regarding the use of ASV for fungal ITS analysis

Thanking you in advance
Saptarathi

salias · July 5, 2024, 10:43am

Hi there!

I'm really curious about that. Could you share the publication reference(s)?

I don't see why that should be a problem when creating ASVs with e.g. DADA2. But maybe I'm too naive to understand the implications.

I understand the point, but I don't think OTUs would be better than ASVs in this case. If a species has several ITS sequences, once you assign taxonomy you can either:

Report ASVs individually like "ASV belonging to species X".
Collapse (cluster) by any taxonomic level (in this case, species), so you can work with grouped ASVs for each species.

Personally I would stick to ASVs for ITS analyses. In fact, that's what I'm doing right now. But I'm open to change my mind if the reasons are big.

Best,

Sergio

--

Disclaimer: I'm only another forum user, just like you. Please don't take my answer as a ground truth. A Forum Moderator would probably provide you with a more accurate answer.

Saptarathi_Deb · July 5, 2024, 11:56am

Hi @salias ,

Thanks for your reply, I am also thinking in the same direction. I understand that fungal diversity identification using ITS is much more complex than bacterial but I feel that ASVs can be much more informative even with the intra species and length differences

Anyway, here's two publications which I found for this side of the argument, and like you, I am also open to understand the logic better..

ITS alchemy: On the use of ITS as a DNA marker in fungal ecology

(ITS alchemy: On the use of ITS as a DNA marker in fungal ecology - ScienceDirect)

Best practices in metabarcoding of fungi: From experimental design to results

(https://onlinelibrary.wiley.com/doi/10.1111/mec.16460)

Best regards
Saptarathi

colinbrislawn · July 5, 2024, 2:39pm

rewritten on 7/6/2024

The problems with OTUs have been the theme of the last decade (mid 2010s to mid 2020s), as programmers reinvented the species concept and learned why it sucks.

In 'ITS alchemy,' Kauserud 2023 doi.org/10.1016/j.funeco.2023.101274 outlines these issues in the context of ITS amplicon sequencing. Kauserud clearly understands the limitations of amplicons but does not grasp that these are inherent to the failure of the species concept.

While Kauserud is happy to dabble in the alchemy of taxonomy, I prefer to avoid it.

Let's start at the beginning:

Living things have DNA that we can sequence
-> -> ATCGG

We count up these sequence variants into a feature table. 'Features' means OTUs or ASVs or whatever

Part 1: ASVs are just OTUs

However, ASVs are conceptually nothing else than OTUs;

Right, it's just another Operational ~~Taxonomic~~ Unit!

-> -> ATCGG -> ASVs/OTUs

The UNOISE author Robert Edgar proposes calling sequence variants 'zOTUs' for zero-radius OTUs, meaning 100% similar OTUs. This differentiates these new things from the older OTUs commonly clustered at 97%.

Why did we cluster OTUs at 97%? Robert also explains that we were trying to capture the species concept. Thanks @.Robert_Edgar !

Part 2: Everyone knows what a Species is, right??

Kauserud 2023 describes why amplicons struggle to capture the species concept.

However, even when having 100% identity and coverage with a certain reference sequence, species hypothesis or taxon, we cannot ultimately be fully confident that our environmental sequence represents this taxon.

Since numerous ITS alleles often occur within fungal species, it is clearly not suitable to use so-called amplicon sequence variants (ASVs) as units in ITS based fungal community studies.

Other problems are not specific to amplicons or genomics at all:

Obviously, there is no general sequence-clustering threshold across species and there will always be a trade-off between over-splitting and lumping of species. (emphasis mine, wiki article about this )

But notice that 'species' is never defined. Why is that?

Wikipedia lists around seven to eleven definitions of species. Is it one of these?

There's a second page just for microbial species. Are fungi microbes?

What about Cryptic- and Super-species? Now I'm spooked!

Part 3: A Sequence Variant is just a bunch of letters

The broad adoption of Sequence Variants is misrepresented as an 'upgrade' to OTUs, misunderstanding that ASVs are an intentional downgrade meant to avoid the failure of the species concept.

-> -> ATCGG -> ASVs -> sp. Colin 1209
please stop here ^

In general, the correct sequences are expected to be more abundant in the final sequence pool compared to the artificial sequences, since the original correct sequences attend more PCR cycles. Further, the parental sequences are expected to be very similar to the mutated sequence(s), most of them differing by only one bp. These programs are able to prune away the most likely PCR mutations (Callahan et al., 2019; Estensmo et al., 2021). When working perfectly, the user will end up with a set of sequences perfectly matching the original haplotypes or alleles in the PCR mix.

Notice how species and taxonomy are never mentioned! This is great!

DADA2, UNOISE, and deblur report sequences
The sequences are evidence of genes
These genes come from microbes
Then it gets really messy

For example, due to intraspecific variability in the ITS region, and sometimes intragenomic variability, ITS sequences must be clustered to approach species level resolution in community studies. (emphasis added)

Sure, we can cluster these Sequence Variants into OTUs, but which of those 7 definitions are we approaching exactly?

After the continuous failure of the species concept in ~~microbial~~ ecology, I appreciate practical programs that produce simple sequences.

Sorry for the long reply. Back to your question:

Yes, I agree! And, crucially, sequence variants are just sequences of letters, so we can match them up to taxonomy or cluster them any which way!

I usually do this:

I don't need alchemy or the species concept.

colinbrislawn · July 7, 2024, 5:55pm

Best practices in metabarcoding of fungi: From experimental design to results

I prefer this to Kauserud 2023, though I suppose Tedersoo 2022 has the benefit of a larger team.

Tedersoo 2022 is a practical guide to fungi amplicon studies, a.k.a. metabarcoding, that includes helpful comparisons between older popular methods and newer approaches. The bioinformatics section explains how the variable length of fungal ITS breaks global alignment algorithms that work fine for 16S amplicons. Even calculations of percent identity are tricky when regions can vary in length!

If a modern denoiser makes ASVs but cannot support variable length reads, it may not be a good fit for ITS1 data, as this article correctly points out.

While quite a bit more balanced, this article still criticizes ASVs for being unlike species, while no definition of the species concept is provided.

Remember, sequence variants are just real reads from a sample.
These are probably genes, but no other meaning is included in the definition.

Here's what I agree with:

The ESV approaches are certainly useful for separating as many species/haplotypes as possible based on conserved genes, but their utility for ITS and protein-coding genes is unclear (Antich et al., 2021). They may outperform traditional OTU clustering approaches in distinguishing very closely related species of Ascomycota with haploid genomes.
...
By reanalysing a data set from Furneaux et al. (2021), we show that the DADA2 ITS pipeline and UNOISE ESV approaches reduce phylogenetic richness by disproportionately eliminating rare members of the unicellular fungal groups, Glomeromycota and nonfungal eukaryotes (Figure Box 2).

Fungal-specific settings may help with this, and their recommended standards can help validate these settings are working as intended.

Here, they equate ASVs with OTUs with Taxa, a common mistake.

However, an ESV approach severely biased species richness estimates of metazoans based on the cytochrome oxidase 1 (CO1) gene (Antich et al., 2021; Brandt et al., 2021), and it is expected to perform poorly for fungal groups with dikaryotic (Basidiomycota ), diploid (most unicellular groups) or polyploid (Glomeromycota ) genomes that commonly exhibit two or multiple different rRNA gene and ITS copies per genome or even within haploid nuclei (Egan et al., 2018; Lindner et al., 2013; Runnel et al., 2022). Estensmo et al. (2021) demonstrated that in polypores, single species contained multiple ESVs.

And I would say:
"Sequence Varients capture multiple allies from diploid and polyploid taxa like Glomeromycota, leading to higher alpha diversity values compared to taxonomy-based methods."

I should remind the authors that after predicting taxonomy, ASVs can be collapsed by taxonomy if that's most helpful to readers.

Lots of people like named taxonomy. It's got that magic

On a different topic, Tedersoo 2022 missed an opportunity here:

Scripts used for analyses should also be released in, for example, Github or zenodo, to secure reproducibility and potential reuse in other applications.

image685×277 10.3 KB

salias · July 8, 2024, 11:23am

I'm not the OP but I ended up learning a lot. Thank you @colinbrislawn for your extended and really documented answer!

Saptarathi_Deb · July 8, 2024, 3:38pm

Hi @colinbrislawn ,

Thanks a lot for your kind response and providing a broader insight into this discussion.

I am going through the recommended paper to understand things better.

Hope this discussion helps others facing a similar dilemma.

Adrian1 · July 8, 2024, 9:10pm

There is no need to assign a taxonomy or a taxonomic concept in any kind of metabarcoding workflow.
I dereplicate all my reads passing QC at 100% and identify all of them with (insert whatever you use here) and only then I do collapse reads based on the assigned taxonomy.
The idea that anybody would want to have a taxonomic unit, even a theoretical one, before you even assign the taxonomy in a marker which has intragenomic variation is wild.

Saptarathi_Deb · July 9, 2024, 5:37pm

@Adrian1 , Totally agree, Thanks,

I guess the assignment of taxonomy will stay ambigious and we should take it with a pinch of salt while inferring anything out of it.