Many species have identical ITS fungal reference sequences in the ITS RefSeq database

How to classify fungal species with identical ITS reference sequences in the ITS RefSeq database?

When using ITS RefSeq for fungal classification, I found that multiple species (e.g., Cladosporium and Penicillium) share identical ITS reference sequences. This causes ambiguous read assignment and false-positive species calls.

How do you usually handle such cases in practice? Do you recommend reporting results at the genus level, or explicitly annotating species-level ambiguity?

Hi @Thao_Pham ,

This is a common issue, and not unique to fungal ITS either. The taxonomic resolution of short marker-gene sequences can be limited.

In practice the solution is to accept the ambiguous read assignment. E.g., if multiple species from the same genus have similar or identical ITS sequences, then the genus assignment will be most reliable.

The classifiers in q2-feature-classifier are designed with this sequence resolution issue in mind. So my advice is to use those classifiers and to interpret any ambiguous classifications accordingly, e.g., accept that you may not have species-level resolution for all sequences.

Of course, there are other tricks for improving resolution, e.g., by careful curation of the reference database to remove ambiguous and low-quality references, or (much more complicated) use of taxonomic class weights to guide classification when you have prior information about which species are more probable in a given system.

Good luck!

5 Likes

Hi Nicholas, thank you again for your advice.

I have compared the RefSeq and UNITE databases. RefSeq focuses on high-confidence type strain sequences, whereas UNITE applies clustering approaches. As a result, although UNITE contains more sequences overall, some species with identical ITS reference sequences may be trimmed (e.g., complete-linkage at 0.5% dissimilarity) or clustered together (e.g., single-linkage at 0.5%) and represented by a single reference sequence. This is one of the main reasons I chose RefSeq for classification.

I also agree that ITS is not a perfect barcode. In some fungal groups, variation accumulates in other loci such as RPB1, D1/D2 LSU, or IGS rather than ITS. Therefore, species merged during ITS-based clustering could represent the same species, but in some cases they may be distinct species that cannot be resolved using ITS alone. Incorporating additional loci or relevant metadata alongside ITS for clustering — similar to the integrative orientation of UNITE — may help improve resolution.

Regarding your suggestions:

  • Curation of the reference database based on UNITE may unintentionally remove biologically meaningful species.
  • Taxonomic class weights are an interesting strategy but likely require system-specific background data.
  • Genus-level assignment may be the most robust solution when ITS lacks resolution. In my dataset, identical ITS sequences form ~119 groups. I am considering retaining these as species-level entries and clearly reporting them as species complexes in the supplementary material.

This would be particularly important for potentially pathogenic fungi. While overconfident species-level identification could be misleading, completely omitting species-level information may reduce the practical value of the test.

Thank you again for your guidance.

2 Likes

Hi everyone,

I’m just jumping into this conversation (I hope it is ok!) to echo what has been discussed (even though the doubt is resolved). We work specifically with fungi, and in our experience, it is often best to accept that ITS is good for a general overview of the mycobiota but frequently hits a resolution ceiling at the genus level (and sometimes at the family level). If the research goal is to look at a specific group, relying solely on ITS can be risky. For example, in our lab, we focus on Fusarium so we complement with translation elongation factor (TEF1) because it is well known that TEF1 is able to discriminate between Fusarium species. So, we use one marker for the general picture :camera_flash: and another marker for the specific fungi we are interested in :mushroom:

Best,

Sergio

4 Likes