guidance using quality-control evaluate-taxonomy

sibilant · December 13, 2025, 2:45pm

Greetings Qiime2 community,

I'm new to Qiime2, using qiime2-amplicon-2025.7 to test my ITS amplicon analysis pipeline with some mock libraries.

What I want to do is use evaluate-taxonomy to compare the reference mock library with the measured taxonomy (e.g. false positives, false negatives, true positives at each classification level). I do not care about relative abundances, and do not have frequency table information for the mock library (so, using evaluate-composition is not an option).

When I look at one of the reference mockrobiota taxonomies recommended for using quality-control evaluate-composition in the ITS tutorial, it does not include feature IDs.

However, I understand from this forum link that the taxonomy tsv files must have a Feature ID column and a Taxon column, and also that the feature IDs must match between the reference (mock library) and measured taxonomy files. The link is using evaluate-composition, but if my little test runs (not shown) are an indication, evaluate-taxonomy must also have matching feature IDs between tax files (otherwise you get F-measures equal to zero across the board).

I am hoping the Qiime2 forum can shed light on a few things:

How is it that the mockrobiota ref taxonomy files can be used with quality-control tools, if they do not include feature IDs?
How could the feature IDs match between a published reference taxonomy of mock library by lab A in the past, and sequence analysis of the published seq runs done by lab B in the present? Feature IDs are unique numbers created during seq analysis pipeline. Even if mock library publications included their frequency table (rare in my experience), their feature IDs would still be different from those I created during a subsequent analysis. I feel like I'm missing something obvious here in terms of how evaluate-taxonomy intends users to conveniently automate this process of comparing ref vs measured taxonomy files, if the feature IDs must match between the two files.
How is evaluate-taxonomy using the feature ID column? Related to 2, I've resigned myself to writing some kind of Python script to format taxonomy files to use with evlauate-taxonomy, but I don't understand how evaluate-taxonomy is using the feature ID columns, or even why they're necessary.
For example, if my taxonomy files only had a single column, for the taxon data:

reference-taxonomy.tsv
Species A
Species B
Species C

measured-taxonomy.tsv
Species A
Species B
Species X

Then I can calculate there are two true positives, one false positive, and one false negative. No feature ID columns are needed for this. (I understand why featureIDs would be needed if relative abundance was also being evaluated, such as evaluate-composition.)

I'd love to use evaluate-taxonomy or similar Qiime2 tool for this type of analysis, but not sure how to move forward. If it helps, here is my reference taxonomy file (MG Bakker 2018 Mol Ecol Resour)
Bakker_ref_mockITS_tax.tsv (2.7 KB)
and my measured taxonomy file.
mockITS_custom-classifier_taxonomy_both_20251201.qza (202.2 KB)

Thanks for any info or suggestions!

colinbrislawn · December 14, 2025, 1:39am

But they are not random!

Well, if the ASV sequence is EXACTLY the same, then its md5 hash would be the same and thus so would be the feature ID name under this naming scheme.

This is important to note because small changes, like an ASV being 1 bp shorter would lead to a different md5 hash. But that is a different sequence after all!

I'm glad you are working with mock-communities! I've used mockrobiota before, too.

I'm also interested how evaluate-taxonomy works, so I'll wait to see what the devs say.

Nicholas_Bokulich · December 14, 2025, 1:05pm

Hi @sibilant ,
I am not sure that evaluate-taxonomy is the action that you want to use. This action is used when you know the exact taxonomic composition of a set of sequences. So it is most useful when you have simulated sequences, or sequences from a set of species in a mock community for which the correct taxonomic lineage is known.

evaluate-composition is better to use when you want to measure classification accuracy from a mock community, in which you know which species are present (and their relative proportions) but do not know a priori which sequences are from which species (as will be typical with most mock community experiments).

Note that the tutorial is using evaluate-composition, not evaluate-taxonomy. The reference file is not given the most intuitive name. It contains the expected taxonomic composition, i.e., the relative proportions of the taxa contained in the mock community. So in QIIME 2 type world it is a FeatureTable[RelativeFrequency] (giving composition information), not a FeatureData (giving information about a set of features). You can think of it as a feature table that has been collapsed on taxonomic annotations (so similar to the output that you would get from qiime feature-table group).

you don't know the proportion at which each species was added to the mock community? If this is the case (or if you don't care about quantitation of individual taxa), you can just make up the proportions and use pseudo-quantitative metrics (like the TAR/TDR metrics used by evaluate-composition, which essentially report information about presence/absence of expected and unexpected taxonomic groups, but does not use the proportion information).

Would that work for your use case?

sibilant · December 16, 2025, 2:03pm

Thanks for the helpful reply @colinbrislawn; good to be on #teammocklibraries with you, and appreciate the great support for these analyses on Qiime2.

sibilant · December 16, 2025, 2:03pm

Thanks for your helpful reply!

I do have this info:
Bakker 2018 Suppl Mat 4.txt (23.9 KB)

In the mean time, following your suggestion to try evaluate-composition instead of evaluate-taxonomy, I can see how it removes the problem / question I was having about Feature IDs (Feature ID column is required for importing FeatureData[Taxonomy] as TSVTaxonomyFormat for example, but not FeatureTable[RelativeFrequency] semantic types).

I am writing a Python program to format my reference taxonomy files to be used with evaluate-composition. Now instead of Feature ID, the Sample ID info must match between ref and measured taxonomy files.

I tried to discover the expected exact SampleID column name and format of my measured taxonomy file. For example, in the mockrobiota tax ref file linked above, the Sample ID is "Mock.1". Since the ref taxonomy file I'm using is a .tsv, I can easily change the ref Sample ID to match whatever my measured taxonomy Sample ID is.

However, I haven't been able to figure out what Qiime2 has used for the measured taxonomy file Sample ID formatting. I did take a look at the documentation for the qiime feature-table group action you suggested above, and while I think I get what you're saying, I still had questions on how my evaluate-composition input files should be formatted.

I followed the ITS tutorial linked previously, first using qiime taxa collapse and then qiime feature-table relative-frequency to process my feature table file (below). The output is a .qza file, which cannot be viewed (message indicated this was because it is biom format). I tried to convert to .qzv using qiime feature-table summarize, and learned that this will only work for FeatureTable[Frequency | PresenceAbsence], not RelativeFrequency.
I also tried things like biom summarize-table and as expected got a Unicode error, since it's a .qza file and not a .biom file.
feature-table_mockITS_20251129_biom_min2_collapse7_rel-freq.qza (238.9 KB)

If I back up a bit and look at a previous file, such as
feature-table_mockITS_20251129_biom_min2.qzv (453.0 KB)
I can see the expected Sample IDs that I inputted from my sample metadata file.

My next move would be to try and use the new more informative viewing of .qza files in the latest release of Qiime2 (2025.10), but unfortunately my university's HPC OS needs to be updated first (qiime2 2025.10 is requiring a more recent version of some gcc libraries than our OS has).

I can do trial and error from the incomplete info I have for writing my python program, but I'm hoping for one or more of the following: 1) advice on viewing my own formatted measured taxonomy file (feature-table_mockITS_20251129_biom_min2_collapse7_rel-freq.qza) or 2) viewing someone else's sample file, such as dada2-single-end-table-relative.qza from the ITS tutorial (code context below), or 3) a description of what this file format would be for input into evaluate-composition. Is there a Sample ID column header, or are the sample IDs themselves the column headers? I have 31 samples in my reference taxonomy: they were essentially all run on the same mock library of 19 fungal species. So, would my reference file just have the same ref taxonomy info listed 31 times, once for each sample name? Et cetera. 4) If there's an easier / better way to do what I'm trying to do, please lmk.

Thanks for any additional info or suggestions!

#from the ITS tutorial linked previously:
qiime taxa collapse
--i-table dada2-single-end-table.qza
--i-taxonomy taxonomy-single-end.qza
--p-level 7
--o-collapsed-table dada2-single-end-table-collapsed.qza

qiime feature-table relative-frequency
--i-table dada2-single-end-table-collapsed.qza
--o-relative-frequency-table dada2-single-end-table-relative.qza

sibilant · December 16, 2025, 6:50pm

Quick update if it's helpful:

here are the (formatted) ref tax and measured tax files that I'm using as input directly into evaluate-composition:

ref:
mockITS-reference-taxonomy-formatted_strains.qza (18.1 KB)

measured:
feature-table_mockITS_20251129_biom_min2_collapse7_rel-freq.qza (238.9 KB)

Viewable version of ref file:
mockITS-reference-taxonomy-formatted_strains.tsv (3.0 KB)

Viewable version of freq table again, used to generate measured tax:
feature-table_mockITS_20251129_biom_min2.qzv (453.0 KB)

As expected, I'm getting error messages such as ValueError: min() arg is an empty sequence, described in this forum post when there is disagreement between the formatting used between the ref and measured files.

The debug info log was similar to the user in the linked forum post. Here's an excerpt (tail):

" File "/ddnlus/r3751/.conda/envs/qiime2-amplicon-2025.7/lib/python3.10/site-packages/q2_quality_control/_utilities.py", line 281, in _pointplot_multiple_y
sns.pointplot(data=results, x=xval, y=score, ax=axes, color=color)
File "/ddnlus/r3751/.conda/envs/qiime2-amplicon-2025.7/lib/python3.10/site-packages/seaborn/categorical.py", line 2839, in pointplot
plotter = _PointPlotter(x, y, hue, data, order, hue_order,
File "/ddnlus/r3751/.conda/envs/qiime2-amplicon-2025.7/lib/python3.10/site-packages/seaborn/categorical.py", line 1603, in init
self.establish_colors(color, palette, 1)
File "/ddnlus/r3751/.conda/envs/qiime2-amplicon-2025.7/lib/python3.10/site-packages/seaborn/categorical.py", line 707, in establish_colors
lum = min(light_vals) * .6"

Thanks again for any info you might have!