how to parse UNITE references for taxonomy assignment with blast?

hello people,

i am stuck with my ITS sequences. i know i can use naive bayes, but i also wanted to perform the taxonomy assignment using blast.

so, i am trying to parse the references files that are available in the UNITE database, to use them within the classify-consensus-blast function.

the problem seems to be quite straightforward for the SILVA database (see the RESCRIPt pipeline) but it looks more tricky for the UNITE database. or i am missing something trivial.

the thing is, i am unable to find a proper taxonomy file (as the one they provide in SILVA). i need it to run the following, i.e. importing references, taxonomy file, then run blast:

conda activate qiime2-2023.2

# import the just created fasta file
qiime tools import \
	--input-path /mnt/tables/sequences.fasta \
	--output-path /mnt/tables/sequences.qza \
	--type 'FeatureData[Sequence]'

# create the qza reference file for the database
qiime tools import \
	--input-path /mnt/database_references/unite/developer/sh_refs_qiime_ver9_dynamic_29.11.2022_dev.fasta \
	--output-path /mnt/database_references/unite/developer/sh_refs_qiime_ver9_dynamic_29.11.2022_dev.qza \
	--type 'FeatureData[Sequence]'

# create the qza taxonomy file for the database
qiime tools import \
	--input-path /mnt/database_references/unite/developer/sh_taxonomy_qiime_ver9_dynamic_29.11.2022_dev.txt \
	--output-path /mnt/database_references/unite/developer/sh_taxonomy_qiime_ver9_dynamic_29.11.2022_dev.qza \
	--type 'FeatureData[Sequence]'

# run blast
qiime feature-classifier classify-consensus-blast \
 	--i-query /mnt/tables/sequences.qza \
	--i-reference-reads /mnt/database_references/unite/developer/sh_refs_qiime_ver9_dynamic_29.11.2022_dev.qza \
	--i-reference-taxonomy /mnt/database_references/unite/developer/sh_taxonomy_qiime_ver9_dynamic_29.11.2022_dev.qza \
	--p-evalue 0.0001 \
	--o-classification /mnt/tables/taxonomy_qiime.qza \
	--o-search-results /mnt/tables/top_hits.qza

# export taxonomy
qiime tools export \
	--input-path /mnt/tables/taxonomy_qiime.qza \
	--output-path /mnt/tables/

the problem occurs when i try to import the file sh_taxonomy_qiime_ver9_dynamic_29.11.2022_dev.txt since it's not a fasta file:

qiime tools import \
	--input-path /mnt/database_references/unite/developer/sh_taxonomy_qiime_ver9_dynamic_29.11.2022_dev.txt \
	--output-path /mnt/database_references/unite/developer/sh_taxonomy_qiime_ver9_dynamic_29.11.2022_dev.qza \
	--type 'FeatureData[Sequence]'

so...where or how do i get the information i need?

1 Like

@gabt,
I have reached out to a few of our mods/the RESCRIPt authors about the issue you are having, hopefully one of them will reply to you soon here!

@gabt,

Looks like: pre-trained UNITE 9.0 classifiers for QIIME 2023.2 (and older!) should get you going!

1 Like

Hi @gabt ,
Your workflow should work, except you are importing the taxonomy as the wrong semantic type here:

It should be FeatureData[Taxonomy]

You can see the q2-feature-classifier tutorials on docs.qiime2.org for more details on this process, but otherwise everything else looks okay.

Good luck!

1 Like

@Nicholas_Bokulich yes, this is what i need. I overlooked this FeatureData[Taxonomy] thing, so thank you for pointing in out!

1 Like

@Keegan-Evans thanks for the resource, but i like the DIY solution best since it does not require other people's doing the job for me :slight_smile:

1 Like

@Keegan-Evans thanks for doing this, let's see if they will get back at you because i would be very happy to have a solution using RESCRIPt, too.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.