The differences in taxonomy output by the feature-classifier trainer and the full-length trainer

CWZ · January 15, 2026, 6:14am

Hi guys,

When I used the unite database to annotate the ITS results, why did the taxonomy table annotated by the classifier trained with primer (ITS1F 5'-CTTGGTCATTTAGAGGAAGTAA-3' ITS2R 5'-GCTGCGTTCTTCATCGATGC-3') differ from that annotated by the classifier directly trained with dynamic data? (sh_qiime_release_19.02.2025.tgz, Qiime2 version 2025.7)

For example:

unite-ver10-99-classifier-dynamic.qza: d__Fungi; p__Ascomycota; c__Saccharomycetes; o__Saccharomycetales; f__Saccharomycetales_fam_Incertae_sedis; g__Candida; s__Candida_albicans

unite-ver10-99-1F2R-classifier-dynamic.qza: don’t have d__Fungi; p__Ascomycota; c__Saccharomycetes; o__Saccharomycetales; f__Saccharomycetales_fam_Incertae_sedis; g__Candida; s__Candida_albicans

I would appreciate any guidance you can give me.

gregcaporaso · January 16, 2026, 2:58pm

Hi @CWZ, Welcome to the QIIME 2 Forum!

This question has come up before - take a look at this conversation, and feel free to post back if you still have questions.

colinbrislawn · January 16, 2026, 6:49pm

Hello @CWZ

Greg already mentioned about the Unite 99% vs Unite 97% vs Unite Dynamic.

I wanted to comment that the primers you use during curation will also change what's included in the database.

The idea is that if we filter the database to only include the PCR primers we use for amplification, that will make better results. The PCR step only amplified specific parts of specific genes, and now we can use rescript to do the same thing in the database!

Does this work? They try is on the COI gene in the rescript paper: RESCRIPt: Reproducible sequence taxonomy reference database management - PMC

Do you have positive controls you can use for testing?

CWZ · January 17, 2026, 8:16am

Hi Greg,

Thank you so much for your kind help!

This conversation has been very helpful to me, but I still have some confusion: For the ITS sequence, is it better to trim the read to primer sites or to use the full reference sequence instead?

CWZ · January 17, 2026, 8:16am

Hi!

Thank you very much! I will read this paper and try to use rescript.

gregcaporaso · January 17, 2026, 4:18pm

For the ITS sequence, is it better to trim the read to primer sites or to use the full reference sequence instead?

I would use the full-length reference sequences. We tend to not see much improvement, if any, when trimming the reference based on the primers used (though trimming can be helpful if you are running into memory limitations on your computer when using the full-length reference sequences).

CWZ · January 18, 2026, 9:13am

Hi Mr. Greg,

I am extremely thank you for your patience in resolving an confusion that has troubled me for quite some time !