Associated UNITE+INSD taxonomy file

Hello everybody,
I have been using UNITE database for my ITS analysis but recently I have read that the UNITE+INSD database is more complete.
Looking for the dataset in the UNITE website I have been able to find the database here: https://plutof.ut.ee/#/doi/10.15156/BIO/786372

My idea was to use this commands to create the database for qiime2-2020.8
qiime tools import --type FeatureData[Taxonomy] --input-path taxonomy_qiime.txt --input-format HeaderlessTSVTaxonomyFormat --output-path tax.qza

qiime tools import --type FeatureData[Sequence] --input-path unite_insd.fasta --output-path refs.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads refs.qza --i-reference-taxonomy tax.qza --o-classifier unite_insd.qza

The problem is that the database from UNITE+INSD just have the .fasta and I can not find the taxonomy file with .txt format.

Could I use the .txt I have from the UNITE database alone? or is there some other way to create the classifier.qza file without that .txt file?

Thank you in advance!

Marisa

Hi Marisa,

You definitely need both the fasta and the taxonomy file! You probably will need a bit of scripting to get those files form the fasta you downloaded! If I am looking at the correct file, the taxonomy information is included in the header of each fasta sequence and you should be able to export it using something like:

grep β€œ>” seq.fasta | sed β€œs/>//” | sed β€œs/|/\t/” > taxonomy.txt

(it suppose to take only the fasta headers and replace the β€œ>” with nothing, and β€œ|” with a β€œTAB”)
Then, you have to process the fasta with something like:

cat seq.fasta | sed β€œs/I/\s/” > new.seq.fasta

This should replace the β€œ|” with a space character so the correct header is processed by qiime2.

Now this is untested and may need a bit of further tweaking …

Please note, it is probably worthy to double-check that all the taxonomy are containing the same number of levels (and possibly the same levels too …)
Good luck

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.