When I used the unite database to annotate the ITS results, why did the taxonomy table annotated by the classifier trained with primer (ITS1F 5'-CTTGGTCATTTAGAGGAAGTAA-3' ITS2R 5'-GCTGCGTTCTTCATCGATGC-3') differ from that annotated by the classifier directly trained with dynamic data? (sh_qiime_release_19.02.2025.tgz, Qiime2 version 2025.7)
Greg already mentioned about the Unite 99% vs Unite 97% vs Unite Dynamic.
I wanted to comment that the primers you use during curation will also change what's included in the database.
The idea is that if we filter the database to only include the PCR primers we use for amplification, that will make better results. The PCR step only amplified specific parts of specific genes, and now we can use rescript to do the same thing in the database!
This conversation has been very helpful to me, but I still have some confusion: For the ITS sequence, is it better to trim the read to primer sites or to use the full reference sequence instead?
For the ITS sequence, is it better to trim the read to primer sites or to use the full reference sequence instead?
I would use the full-length reference sequences. We tend to not see much improvement, if any, when trimming the reference based on the primers used (though trimming can be helpful if you are running into memory limitations on your computer when using the full-length reference sequences).