feature-classifier error while using UNITE

I'm struggling to assign taxonomy to ITS amplicons. I downloaded the UNITE v8 database and imported the reference sequences and taxonomy using the following:

qiime tools import \
   --type FeatureData[Sequence] \
   --input-path sh_refs_qiime_ver8_97_10.05.2021.fasta \
   --output-path unite-v8-97-seqs.qza

qiime tools import \
   --type FeatureData[Taxonomy] \
   --input-format HeaderlessTSVTaxonomyFormat \
   --input-path sh_taxonomy_qiime_ver8_97_10.05.2021.txt \
   --output-path unite-v8-97-tax.qza

I then used VSEARCH to assign taxonomy:

qiime feature-classifier classify-consensus-vsearch \
   --i-query dada2_output/dada2_rep_seqs.qza \
   --i-reference-reads ../ReferenceSequences/unite-v8-97-seqs.qza \
   --i-reference-taxonomy ../ReferenceSequences/unite-v8-97-tax.qza \
   --o-classification taxa/asv_classification.qza

And received this error:

Plugin error from feature-classifier: 'Identifier 227 was reported in taxonomic search results, but was not present in the reference taxonomy.'

I compared the original FASTA and taxonomy files, but didn't find any discrepancy between the 227th sequence and taxonomic ID. Is this an error in the original database, or something with the way I imported these files as q2 artifacts? I also attempted this with the dynamic clustered database and received the same error.

Hello April,

This tutorial about importing the UNITE database into qiime2 includes a section about reformatting the fasta headers.

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' sh_refs_qiime_ver8_99_04.02.2020_dev.fasta | tr -d ' ' > sh_refs_qiime_ver8_99_04.02.2020_dev_uppercase.fasta

Could you try that out and let me know if it helps?

Thanks for the link, Colin! I reformatted the fasta headers, but unfortunately I am getting the same error as before. I also used the developer files as the tutorial pointed out.

1 Like

Well I'm stumped! :thinking:

Do you get this same error with other identities (like 99 or dynamic, which should both be better anyway)? What version of Qiime are you using?

I am using the latest Qiime 2 v. 2021.11. Yes, I did try using the 99% and dynamic identity set and got the same error.

Strangely enough, I was able to train a classifier with reference sequences and taxonomy, and then used that to assign taxonomy. I cannot fathom why VSEARCH won't work when training a classifier will.

Luckily this reference dataset is somewhat small compared to Silva, so I can easily use classify-sklearn where normally I would opt to use VSEARCH due to my hardware limitations.

That's great to hear! :+1:

I'll look into this, more!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.