Greetings Qiime2 community,
![]()
I'm new to Qiime2, using qiime2-amplicon-2025.7 to test my ITS amplicon analysis pipeline with some mock libraries.
What I want to do is use evaluate-taxonomy to compare the reference mock library with the measured taxonomy (e.g. false positives, false negatives, true positives at each classification level). I do not care about relative abundances, and do not have frequency table information for the mock library (so, using evaluate-composition is not an option).
When I look at one of the reference mockrobiota taxonomies recommended for using quality-control evaluate-composition in the ITS tutorial, it does not include feature IDs.
However, I understand from this forum link that the taxonomy tsv files must have a Feature ID column and a Taxon column, and also that the feature IDs must match between the reference (mock library) and measured taxonomy files. The link is using evaluate-composition, but if my little test runs (not shown) are an indication, evaluate-taxonomy must also have matching feature IDs between tax files (otherwise you get F-measures equal to zero across the board).
I am hoping the Qiime2 forum can shed light on a few things:
-
How is it that the mockrobiota ref taxonomy files can be used with
quality-controltools, if they do not include feature IDs? -
How could the feature IDs match between a published reference taxonomy of mock library by lab A in the past, and sequence analysis of the published seq runs done by lab B in the present? Feature IDs are unique numbers created during seq analysis pipeline. Even if mock library publications included their frequency table (rare in my experience), their feature IDs would still be different from those I created during a subsequent analysis. I feel like I'm missing something obvious here in terms of how
evaluate-taxonomyintends users to conveniently automate this process of comparing ref vs measured taxonomy files, if the feature IDs must match between the two files. -
How is
evaluate-taxonomyusing the feature ID column? Related to 2, I've resigned myself to writing some kind of Python script to format taxonomy files to use withevlauate-taxonomy, but I don't understand howevaluate-taxonomyis using the feature ID columns, or even why they're necessary.
For example, if my taxonomy files only had a single column, for the taxon data:
reference-taxonomy.tsv
Species A
Species B
Species C
measured-taxonomy.tsv
Species A
Species B
Species X
Then I can calculate there are two true positives, one false positive, and one false negative. No feature ID columns are needed for this. (I understand why featureIDs would be needed if relative abundance was also being evaluated, such as evaluate-composition.)
I'd love to use evaluate-taxonomy or similar Qiime2 tool for this type of analysis, but not sure how to move forward. If it helps, here is my reference taxonomy file (MG Bakker 2018 Mol Ecol Resour)
Bakker_ref_mockITS_tax.tsv (2.7 KB)
and my measured taxonomy file.
mockITS_custom-classifier_taxonomy_both_20251201.qza (202.2 KB)
Thanks for any info or suggestions! ![]()