I need to run taxonomic assignment for some 12S data for birds, but I do not have a good reference database that includes my species of interest. I do have some reference genomes, but I was wondering if this is good practice to use, since it will essentially be forcing assignment on a single species reference sequences. I didn't see anything in the forum about using un-annotated reference genomes for target species without reference sequences, and 12S sequences for birds are not as well represented as MiFish. What is the best approach for assigning taxonomy in this case?
Thanks for the quick response and the helpful links!
I will try out the suggested methods for NCBI extraction and RESCRIPT, hopefully I will have some luck with that. I think the one issue with these methods is that there aren't annotated gene sequences for the target or related species I am identifying, only a few recent whole reference genomes not yet on NCBI. Is there an alternative way to use reference genomes to identify whether a species is present or not within a sample? Sorry if this may just be a basic blast query solution, I am fairly new to this field.
The best thing to do would be to trim the 12S sequences from the full genomes. That part would need to be done outside of QIIME 2, as the current sequence trimmer is not designed for trimming multi-copy genes from whole genome sequences.
Once you have a FASTA of 12S sequences, these could be imported into QIIME 2 as a FeatureData[Sequence] artifact and used as your reference database.