Hello everybody,
I have been using UNITE database for my ITS analysis but recently I have read that the UNITE+INSD database is more complete.
Looking for the dataset in the UNITE website I have been able to find the database here: https://plutof.ut.ee/#/doi/10.15156/BIO/786372
My idea was to use this commands to create the database for qiime2-2020.8
qiime tools import --type FeatureData[Taxonomy] --input-path taxonomy_qiime.txt --input-format HeaderlessTSVTaxonomyFormat --output-path tax.qza
You definitely need both the fasta and the taxonomy file! You probably will need a bit of scripting to get those files form the fasta you downloaded! If I am looking at the correct file, the taxonomy information is included in the header of each fasta sequence and you should be able to export it using something like:
grep “>” seq.fasta | sed “s/>//” | sed “s/|/\t/” > taxonomy.txt
(it suppose to take only the fasta headers and replace the “>” with nothing, and “|” with a “TAB”)
Then, you have to process the fasta with something like:
cat seq.fasta | sed “s/I/\s/” > new.seq.fasta
This should replace the “|” with a space character so the correct header is processed by qiime2.
Now this is untested and may need a bit of further tweaking …
Please note, it is probably worthy to double-check that all the taxonomy are containing the same number of levels (and possibly the same levels too …)
Good luck