Hello everyone,
I am a beginner using Qiime2 for metabarcoding data analyses, and I am encounter a difficulty with the taxonomic assignment. I saw another discussion post about converting vsearch format database into Qiime2 database for taxonomic assignment. Link here: (Converting a USEARCH UNITE ITS database to QIIME 2 format)
However, I was trying to do exactly the opposite, which is to convert from the Qiime2 database (2 files, one taxa file and one fasta file) to a database that is compatible with vsearch or other programs (1 file, could be a fasta file where the header contains all the information in formats such as sintax, and the following row is the sequence). I was wondering if there are any options?
For example, the first 5 rows in the taxa file are
U02002 k__Metazoa;p__Arthropoda;c__Malacostraca;o__Decapoda;f__Alpheidae;g__Alpheus;s__Alpheus_canalis U02003 k__Metazoa;p__Arthropoda;c__Malacostraca;o__Decapoda;f__Alpheidae;g__Alpheus;s__Alpheus_canalis U02004 k__Metazoa;p__Arthropoda;c__Malacostraca;o__Decapoda;f__Alpheidae;g__Alpheus;s__Alpheus_cristulifrons U02005 k__Metazoa;p__Arthropoda;c__Malacostraca;o__Decapoda;f__Alpheidae;g__Alpheus;s__Alpheus_cristulifrons U02006 k__Metazoa;p__Arthropoda;c__Malacostraca;o__Decapoda;f__Alpheidae;g__Alpheus;s__Alpheus_cristulifrons
The first 5 rows in the sequence file are (there is a > at the beginning of each line but missing here on the website due to formatting):
U02002
CCAATCCTTTACCAACATCTATTCTGATTCTTCGGACACCCAGAAGTTTACATTTTAATTCTCCCAGCTTTCGGTATAATCTCCCATATTATTAATCAAGAATCAGGGAAAAAAGAAGCATTCGGAACACTAGGTATAATCTATGCAATAGCAGCAATTGGAATCCTTGGGTTTGTAGTATGAGCACACCACATATTCACTGTTGGTATAGACGTAGACACACGAGCCTACTTCACATCAGCAACTATAATTATTGCAGTTCCCACAGGAATTAAAATTTTCAGGTGACTAGGAACCCTACACGGAAGACAATTCACCTACAGACCATCACTACTTTGAGCCCTAGGATTCGTATTCCTATTCACAATAGGAGGACTAACCGGTGTAGTACTAGCGAACTCATCTATTGACATTATCCTTCACGACACTTATTATGTAGTAGCACATTTCCACTACGTACTGTCCATAGGAGCTGTATTTGGAATCTTTGCAGGTATTGCTCATTGATTCCCCCTATTCACAGGACTATCACTAAACCCACAATGACTAAAAATACACTTTTTCACTATATTCATTGGAGTAAATATTACATTCTTCCCTCAACACTTCCTTGGATTAAACGGTATGCCA
U02003
TTTTGATTCNTCGGTCACCCCGAAGTCTACATTCTTATTCTACCAGCTTTCGGTATAATCTCCCATATTATTAACCAAGAATCAGGGAAAAAGGAAGCGTTCGGGACACTAGGTATAATCTACGCAATAGCAGCAATTGGAATCCTTGGATTTGTAGTATGGGCACATCACATGTTCACAGTTGGTATAGATGTAGACACACGAGCCTACTTTACATCAGCAACTATAATTATTGCAGTTCCCACTGGAATTAAAATTTTCAGGTGGCTAGGAACCCTGCACGGGAGACAATTCACCTACAGACCATCACTACTTTGAGCCCTAGGGTTCGTATTCCTATTCACAATAGGTGGACTAACCGGTGTGGTACTAGCAAACTCATCTATAGATATTATCCTCCACGACACTTATTATGTAGTAGCACACTTCCACTATGTCCTGTCAATAGGAGCCGTATTCGGGATCTTTGCAGGTATTGCTCATTGATTCCCCCTATTCACAGGGTTATCTCTAAACCCCCAGTGACTTAAAATGCACTTTTTCACTATATTCATTGGAGTAAACATTACATTCTTCCCACAACACTTCCTTGGGCTAAACGGAATGCCTCGACGGTACTCTGACTACCCAGACGCTTATACT
U02004
ATTCTATATCAACATCTATTCTGATTCTTTGGGCACCCTGAAGTGTATATTTTAATCCTACCCGCTTTTGGAATAATCTCCCACATTATCAACCAAGAATCCGGTAAAAAAGAAGCATTTGGAACACTAGGTATGATCTACGCCATAGCAGCCATTGGTATCCTTGGTTTCGTAGTGTGGGCCCATCATATATTTACAGTAGGCATGGACGTTGACACTCGAGCCTACTTTACATCTGCAACTATAATTATTGCAGTTCCCACTGGAATTAAAATTTTCAGATGATTAGGAACATTACATGGAAGTCAGTTCACTTACAGACCATCCCTACTCTGAGCCCTGGGATTTGTATTCCTATTCACTATAGGAGGTCTCACGGGAGTAGTCCTAGCTAACTCTTCCATCGATATCATCCTTCACGACACCTATTATGTTGTAGCCCATTTCCACTACGTCCTATCAATAGGAGCCGTCTTTGGAATCTTTGCAGGAATCGCCCACTGATTCCCACTATTTACCGGTCTATCTCTGAATCCTCAATGACTTAAAATACACTTCTTTACTATATTTATCGGAGTTAATATCACATTCTTCCCACAACACTTCTTAGGCCTGAATGGAATACCTCGACGATAC
U02005
CTTTACCAACACCTATTNNNNNNNNTNNNTCACCCAGAGGTTTACATTTTAATTCTACCGCCTCTTGGTATAATNTCCCACATTATAAATCAAGAGTNCGGCAAAAAAGAAGCNTTCGGAACATTAGGTATAATTTACGCAATAGCAGCAATCGGTATCCTAGGCTTNGTAGTATCAGCCCATCATATGTTTACTGTTNNNNNNNNNNNNGACACACGAGCCTATTTCACCTCAGCAACTATAATTATTGCAGTCCCTACAGGAATTAAAATCTTCAGATGACTGAGAACTCTACATGGTACACAATTCACATATAGACCCTCTCTTTTATGGGCCTTAGGATTTGTATTCCTATTCACTATAGGAGGTTTAACAGGAGTAATTTTAGCTAACTCCTCTATTGATATNATCTTACATGACACTAACTNTGTTGTAGCACACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGGAANNTTCGCCGGAATCGCCCATTGATTCCCCCTCTTTACAGGACTATCATTAAACCCAAAATTACTTAAAATACNCTTCTTTACTATATTNNNCGGAGTTAACATTACATTCTTCCCCCAACACTTCTTGGGG
U02006
CACCCAGAGGTTTACATTTTAATTCTACCAGCCTTTGGTATAATCTCCCACATTATAAATCAAGAGTCCGGCAAAAAAGAAGCATTCGGAACATTAGGTATAATTTACGCAATAGCAGCAATCGGTATCCTAGGCTTTGTAGTATGAGCCCATCATATGTTTACTGTTNNNNNNNNNNNNGACACACGAGCCTATTTCACCTCAGCAACTATAATTATTGCAGTACCTACAGGAATTAAAATCTTCAGATGACTGAGAACTCTACATGGTACACAATTCACATATAGACCCTCTCTTTTATGGGCCTTAGGATTCGTGTTCCTATTTACTATAGGAGGTTTAACAGGAGTAATTTTAGCTAACTCCTCTATTGATATTATCTTACACGACACCTATTATGTTGTAGCACACTTCCAATATGTCCTATCAATAGGAGCTGTATTTGGAATTTTCGCGGGGATCGCCCATTGATTCCCACTCTTTACAGGGCTATCATTAAACCCTAAATGACTTAAAATACACTTCTTTACTATATTCATCGGGGTTAACATTACATTCTTCCCC
And below is something I want:
U02002;tax=d:Eukaryota;p:Arthropoda;c:Malacostraca;o:Decapoda;f:Alpheidae;g:Alpheus;s:Alpheus_canalis
CCAATCCTTTACCAACATCTATTCTGATTCTTCGGACACCCAGAAGTTTACATTTTAATTCTCCCAGCTTTCGGTATAATCTCCCATATTATTAATCAAGAATCAGGGAAAAAAGAAGCATTCGGAACACTAGGTATAATCTATGCAATAGCAGCAATTGGAATCCTTGGGTTTGTAGTATGAGCACACCACATATTCACTGTTGGTATAGACGTAGACACACGAGCCTACTTCACATCAGCAACTATAATTATTGCAGTTCCCACAGGAATTAAAATTTTCAGGTGACTAGGAACCCTACACGGAAGACAATTCACCTACAGACCATCACTACTTTGAGCCCTAGGATTCGTATTCCTATTCACAATAGGAGGACTAACCGGTGTAGTACTAGCGAACTCATCTATTGACATTATCCTTCACGACACTTATTATGTAGTAGCACATTTCCACTACGTACTGTCCATAGGAGCTGTATTTGGAATCTTTGCAGGTATTGCTCATTGATTCCCCCTATTCACAGGACTATCACTAAACCCACAATGACTAAAAATACACTTTTTCACTATATTCATTGGAGTAAATATTACATTCTTCCCTCAACACTTCCTTGGATTAAACGGTATGCCA
Thank you!