Hello everybody,
I have been using UNITE database for my ITS analysis but recently I have read that the UNITE+INSD database is more complete.
Looking for the dataset in the UNITE website I have been able to find the database here: https://plutof.ut.ee/#/doi/10.15156/BIO/786372
My idea was to use this commands to create the database for qiime2-2020.8
qiime tools import --type FeatureData[Taxonomy] --input-path taxonomy_qiime.txt --input-format HeaderlessTSVTaxonomyFormat --output-path tax.qza
You definitely need both the fasta and the taxonomy file! You probably will need a bit of scripting to get those files form the fasta you downloaded! If I am looking at the correct file, the taxonomy information is included in the header of each fasta sequence and you should be able to export it using something like:
grep β>β seq.fasta | sed βs/>//β | sed βs/|/\t/β > taxonomy.txt
(it suppose to take only the fasta headers and replace the β>β with nothing, and β|β with a βTABβ)
Then, you have to process the fasta with something like:
cat seq.fasta | sed βs/I/\s/β > new.seq.fasta
This should replace the β|β with a space character so the correct header is processed by qiime2.
Now this is untested and may need a bit of further tweaking β¦
Please note, it is probably worthy to double-check that all the taxonomy are containing the same number of levels (and possibly the same levels too β¦)
Good luck