How to build a classifier from a Fungene database

Fungene repository has databases for most of the functional genes in my study. I want to use these to build a classifer for each amplicon. However, downloading these repositories results in an .fa file (which contains the NCBI accession numbers).

How can I use the databases made available through Fungene to build a classifer? To make the issue even more complicated, some of these databases are translated to an amino acid sequence.

Hi @A_Bennett,
I have not used fungene to build a classifier, but this piece of information gives me hope:

So I assume the issue is that you just have the fasta but no taxonomy/annotation in a separate file? Normally the formatting would need to be done on your own, but since you have NCBI accession numbers in the fasta, you can use that to download sequences and taxonomy directly from genbank, using RESCRIPt... see this tutorial for details:

That tutorial may not cover all the specifics... in your case, you will want to get the NCBI accession numbers from the fasta file (either extract these manually or import the file to QIIME 2 if there are not formatting issues ot prevent this) and pass that file as a metadata file:

qiime rescript get-ncbi-data \
    --m-accession-ids-file list-of-sequence-ids.txt \
    --o-sequences fungene-seqs.qza \
    --o-taxonomy fungene-taxonomy.qza

That's an issue... you will need to translate back to DNA before importing, because q2-feature-classifier (which I am assuming you plan to use) cannot handle amino acid sequences. Maybe the NCBI accession numbers link to DNA sequences already?

Give that a spin and let me know what you find!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.