UNITE taxonomic annotation

Hi there,I encountered a problem that has left me quite confused. I have a sequence file (rep-seqs.qza) with 43,242 features. Simultaneously, I extracted the first 10 features from this file and placed them in another file (rep-seqs.top10.qza). Use the same command for taxonomic annotation as follows:

qiime feature-classifier classify-sklearn --i-classifier unite_classifier.qza --p-reads-per-batch 5000 --p-n-jobs 5 --i-reads rep-seqs.qza --o-classification rep-seqs.tax.qza

qiime feature-classifier classify-sklearn --i-classifier unite_classifier.qza --p-reads-per-batch 5000 --p-n-jobs 5 --i-reads rep-seqs.top10.qza --o-classification rep-seqs.top10.tax.qza

Now, what confuses me is that the annotation results for the first 10 features in the two files show significant differences. As shown below, most sequences in rep-seqs.tax.qza are only annotated at the phylum or kingdom level, while sequences in rep-seqs.top10.tax.qza can be annotated to lower taxonomic levels. Through direct BLASTn alignment, I personally believe that the annotations in rep-seqs.top10.tax.qza should be correct. However, this confusing problem has caused considerable inconvenience. Does anyone know the reason behind this?

version of unite database: qiime_ver9_dynamic_25.07.2023
version of qiime2: q2cli version 2021.2.0

#rep-seqs.tax.qza

Feature ID Taxon Confidence
e25bb45f33000148f4784fb8a0192a94 k__Fungi 1.0000000000000029
9983e55793993437b55cab322d63447b k__Fungi;p__Ascomycota 0.9039648869636827
25c564d5e55f98e58789360819ee3499 k__Fungi;p__Ascomycota 0.8300074670265142
4054ed254d179ef9f1b5d311c5c1455a k__Fungi 0.999999999999989
288fbe83f5b067bb3d5fffc8db99c79a k__Fungi;p__Ascomycota 0.7138320608603023
6f3587160fa53e777951702a8dc866cf k__Fungi;p__Ascomycota 0.8357375350414811
0af4b6805825d48f524ee15da946b231 k__Fungi 1.0000000000000095
18ff526252f1d3b77eb7dfc74543291c k__Fungi;p__Ascomycota;c__Orbiliomycetes;o__Orbiliales;f__Orbiliaceae;g__Lilapila;s__Lilapila_jurana;sh__SH1106504.09FU 0.7961239372885142
d17b7b1ce908f2a5915ffed9c086de33 k__Fungi 0.9999999999999869
946526232fa0fb80f2a88be6172f9812 k__Fungi 0.9999999999999876


#rep-seqs.top10.tax.qza

Feature ID Taxon Confidence
e25bb45f33000148f4784fb8a0192a94 k__Fungi;p__Ascomycota;c__Saccharomycetes;o__Saccharomycetales;f__Saccharomycetaceae;g__Saccharomyces 0.9999999999782813
9983e55793993437b55cab322d63447b k__Fungi;p__Ascomycota;c__Sordariomycetes;o__Hypocreales;f__Cordycipitaceae;g__Leptobacillium;s__Leptobacillium_chinense;sh__SH2489190.09FU 0.994308897728974
25c564d5e55f98e58789360819ee3499 k__Fungi 1.000000000000003
4054ed254d179ef9f1b5d311c5c1455a k__Fungi;p__Ascomycota;c__Saccharomycetes;o__Saccharomycetales;f__Saccharomycetales_fam_Incertae_sedis;g__Candida;s__Candida_albicans;sh__SH0987662.09FU 0.9997210015592886
288fbe83f5b067bb3d5fffc8db99c79a k__Fungi 1.0000000000000095
6f3587160fa53e777951702a8dc866cf k__Fungi;p__Ascomycota 0.9071876227203739
0af4b6805825d48f524ee15da946b231 k__Fungi;p__Ascomycota;c__Saccharomycetes;o__Saccharomycetales;f__Saccharomycetales_fam_Incertae_sedis;g__Candida;s__Candida_albicans;sh__SH0987662.09FU 0.9997176727654474
18ff526252f1d3b77eb7dfc74543291c k__Fungi;p__Ascomycota 0.7045709941313546
d17b7b1ce908f2a5915ffed9c086de33 k__Fungi;p__Ascomycota 0.700394726372004
946526232fa0fb80f2a88be6172f9812 k__Fungi;p__Ascomycota;c__Saccharomycetes;o__Saccharomycetales;f__Saccharomycetaceae;g__Saccharomyces 0.9999999999733369

1 Like

Hi @RChGO ,
Most likely your query sequences are in mixed orientations, i.e., a mixture of forward and reverse sequences of your target. The classify-sklearn action currently assumes that all reads are in the same orientation, and autodetects the orientation based on the first few sequences... so if you change the set of sequences it may change the orientation that the classifier uses, hence why you get different results when using all query sequences vs. a subset. The classify-consensus-blast action, on the other hand, allows alignment in both orientations so is immune to this issue.

The rescript plugin has an action to reorient the sequences in the same direction as your reference database. See the qiime rescript orient-seqs action for more details.

Good luck!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.