Hello, everyone! Some of the species in my study area do not have corresponding reference sequences on NCBI, and I now want to amplify those reference sequences that are not available on NCBI myself, how should I organize these sequences into a database that can be used for species annotation after I get them?
Consider this tutorial: Using RESCRIPt to compile sequence databases and taxonomy classifiers from NCBI Genbank
This tutorial will work for sequences already on NCBI, but it sounds like your microbes do not have sequenced genomes on NCBI. Have you already isolated, sequenced, and assembled their genomes? What data do you have?
Building databases is hard! There is no 'correct' way to do it, and everyone will argue with you over your methods. I try to avoid this if possible.