MiFish database from nucleotide.

Hi Folks,
I wanted to put it out there that I've put together a database of MiFish sequences scraped from NCBI.
These are sequences matching the query "(((12S OR MiFish) OR mitochondrion) NOT chromosome) NOT shotgun". Those sequences were then aligned against a curated selection of MiFish representative sequences we have found in our samples and the MiFish region extracted using those alignments, with off target sequences, and <140 bp alignments discarded.
This creates a very efficient database, with vsearch I was able to assign taxonomy to my testing samples with 1,400 rep seqs in about 2 seconds.
The taxonomy format is "lineage; scientific name; common name"
I highly suggest manually curating MiFish taxonomy, but this provides a good starting point especially for filtering out off target sequences.

Happy to hear thoughts/suggestions here or on github.
Devin

4 Likes

Hi Devin,

I am fairly new to QIIME2, but I am trying to identify as many as 19 locally occurring fish species in my eDNA samples. I amplified the DNA using the MiFish-U primer set, as well as another primer set for the CYTB gene. When I was validating these primer sets for my study, I downloaded all of the available sequences for my species of interest from NCBI into Geneious Prime. I even have the amplicon region for each species isolated.

In QIIME2, I have my sequences dereplicated and now I want to cluster my features (and eventually identify what species they belong to) with vsearch using the closed reference method (vsearch-cluster-features-closed-reference). I imagine could use your reference database file for MiFish, but I would like to create my own database that includes the amplicons from both of my primers, and only for the 19 species that I am looking for in my study area. You mention that you recommend manually curating MiFish taxonomy, and I think that's what I'm trying to do. I looked at your README file but I'm afraid I'm not familiar enough with QIIME2 to fully digest what you have laid out there. I am hoping that you, or anyone else could help me to create a custom database to use with vsearch.

Andrew

Hi Andrew,
Have you gotten the reference sequences imported into qiime2? If not you can go off the example at the beginning of this tutorial: Training feature classifiers with q2-feature-classifier โ€” QIIME 2 2022.2.0 documentation where they import the reference taxonomy. You'd need to format your reference sequences like the two 85_otu files in that example.
I'm not familiar with how the closed reference clustering works, but with such a limited database, I'd want to be sure you aren't clustering close relatives. For example in our samples we get lots of different sunfish, which can sometimes have almost no difference between their sequences.

My workflow is generally running the taxonomy with the database from my initial post, then manually checking the assigned taxonomy by web blasting the sequences and synthesizing the blast results with some knowledge of local species. In that scheme a region specific database would be useful in that second step.
Devin

Thanks Devin,

I appreciate your prompt reply and I will look into that tutorial. Thanks!

Andrew

Hi Devin,
Can this database be directly imported into QIIME2 for species annotation?
Miao Li