I wanted to put it out there that I've put together a database of MiFish sequences scraped from NCBI.
These are sequences matching the query "(((12S OR MiFish) OR mitochondrion) NOT chromosome) NOT shotgun". Those sequences were then aligned against a curated selection of MiFish representative sequences we have found in our samples and the MiFish region extracted using those alignments, with off target sequences, and <140 bp alignments discarded.
This creates a very efficient database, with vsearch I was able to assign taxonomy to my testing samples with 1,400 rep seqs in about 2 seconds.
The taxonomy format is "lineage; scientific name; common name"
I highly suggest manually curating MiFish taxonomy, but this provides a good starting point especially for filtering out off target sequences.
Happy to hear thoughts/suggestions here or on github.