Associated UNITE+INSD taxonomy file

Marisa_Tello_Martin · October 26, 2020, 11:14am

Hello everybody,
I have been using UNITE database for my ITS analysis but recently I have read that the UNITE+INSD database is more complete.
Looking for the dataset in the UNITE website I have been able to find the database here: https://plutof.ut.ee/#/doi/10.15156/BIO/786372

My idea was to use this commands to create the database for qiime2-2020.8
qiime tools import --type FeatureData[Taxonomy] --input-path taxonomy_qiime.txt --input-format HeaderlessTSVTaxonomyFormat --output-path tax.qza

qiime tools import --type FeatureData[Sequence] --input-path unite_insd.fasta --output-path refs.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads refs.qza --i-reference-taxonomy tax.qza --o-classifier unite_insd.qza

The problem is that the database from UNITE+INSD just have the .fasta and I can not find the taxonomy file with .txt format.

Could I use the .txt I have from the UNITE database alone? or is there some other way to create the classifier.qza file without that .txt file?

Thank you in advance!

Marisa

llenzi · October 26, 2020, 1:26pm

Hi Marisa,

You definitely need both the fasta and the taxonomy file! You probably will need a bit of scripting to get those files form the fasta you downloaded! If I am looking at the correct file, the taxonomy information is included in the header of each fasta sequence and you should be able to export it using something like:

grep “>” seq.fasta | sed “s/>//” | sed “s/|/\t/” > taxonomy.txt

(it suppose to take only the fasta headers and replace the “>” with nothing, and “|” with a “TAB”)
Then, you have to process the fasta with something like:

cat seq.fasta | sed “s/I/\s/” > new.seq.fasta

This should replace the “|” with a space character so the correct header is processed by qiime2.

Now this is untested and may need a bit of further tweaking …

Please note, it is probably worthy to double-check that all the taxonomy are containing the same number of levels (and possibly the same levels too …)
Good luck

system · November 26, 2020, 7:26pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.