Yes based on your description it sounds like that information is only missing if it was not annotated in the first place!
Ok, thanks for verifying.
I’m going to run a local BLAST with all my Unassigned sequences now and see how many pass a 97% identity && 94% coverage. I’m concerned about how many ASVs are Unassigned (9710 of 23932) - but maybe this is common with ASVs? You’d think they’d hit upon something…
It also makes me wonder if I need to go down yet another exploratory hole in coming up with some sort of naive classifier. Perhaps if I could train the classifier with my more curated database then the number of Unassigned ASVs would drop substantially (even if the completeness of a particular ASV may not be terrific).
I’m wondering your thoughts on the naive classifier route in my circumstance. It seems like this would resolve the “use multiple database” question, because I could just start with one and only one. It also wipes my hands clean of selectively including or excluding an Unclassified ASV from a secondary search with something like NCBI Blast.
Give it a try — others have reported using this classifier for COI on this forum, so it should work. Until you benchmark on your mock communities we will not know how well… but these classifiers generally work well.