I’m using QIIME2 for a dietary study of a herbivorous marine gastropod using the 23S gene due to its broad applicability over many macroalgal species. Unfortunately, unlike the 16S region, there are no databases (that I know of) that are specific to the gene region I am targeting. I have built a preliminary custom database semi-manually by aligning a selection of Sanger sequenced 23S PCR products from DNA extracted from local macroalgal species and manually inputting the taxonomic classifications in the appropriate format for use in importing for
Unique ID followed by the classification;
|B12_E02|d_Eukaryota;k_Chromista;p_Ochrophyta;c_Phaeophyceae;o_Fucales;f_Hormosiraceae;g_Hormosira;s_Hormosira_banksii| |B5_p23SrV_A08|d_Eukaryota;k_Chromista;p_Ochrophyta;c_Phaeophyceae;o_Fucales;f_Durvillaeaceae;g_Durvillaea;s_Durvillaea_potatorum| |B4_p23SrV_A07|d_Eukaryota;k_Chromista;p_Ochrophyta;c_Phaeophyceae;o_Fucales;f_Seirococcaceae;g_Phyllospora;s_Phyllospora_comosa| |B13_F02|d_Eukaryota;k_Chromista;p_Ochrophyta;c_Phaeophyceae;o_Laminariales;f_Alariaceae;g_Undaria;s_Undaria_pinnatifida|
Currently my database works well and is adhering to expected outcomes, but I would like to bolster it to improve the many gaps it is likely to have. I created the current taxonomy txt file in the format above completely by hand (too slow!) and want to be able to complement many sequences in this format from a list of BLAST results.
Is there a way to convert an aligned FASTA file of multiple sequences from a blast search to a complementary txt file in the format above to be used in creating a custom database?
I’m still pretty green when it comes to using bioinformatic tools so please be patient with my ignorance!
Thanks in advance!