How to build a classifier from a Fungene database

Nicholas_Bokulich · December 9, 2020, 7:42am

Hi @A_Bennett,
I have not used fungene to build a classifier, but this piece of information gives me hope:

So I assume the issue is that you just have the fasta but no taxonomy/annotation in a separate file? Normally the formatting would need to be done on your own, but since you have NCBI accession numbers in the fasta, you can use that to download sequences and taxonomy directly from genbank, using RESCRIPt... see this tutorial for details:

That tutorial may not cover all the specifics... in your case, you will want to get the NCBI accession numbers from the fasta file (either extract these manually or import the file to QIIME 2 if there are not formatting issues ot prevent this) and pass that file as a metadata file:

qiime rescript get-ncbi-data \
    --m-accession-ids-file list-of-sequence-ids.txt \
    --o-sequences fungene-seqs.qza \
    --o-taxonomy fungene-taxonomy.qza

That's an issue... you will need to translate back to DNA before importing, because q2-feature-classifier (which I am assuming you plan to use) cannot handle amino acid sequences. Maybe the NCBI accession numbers link to DNA sequences already?

Give that a spin and let me know what you find!