I'm working with a fungus that has no sequence info in Genbank. I've used sanger sequence to generate sequence information from a number of cultures and used those sequences to generate a custom BLAST database.
To check if the BLAST database could assign taxonomy, I imported the same sequences used to create my BLAST database into QIIME (qiime2-2019.4) and assigned taxonomy:
To check my reference taxonomy file was correct I unzipped Pachy_and_off-targets_taxonomy.qza and checked the taxonomy.tsv file within the data folder. It looks as expected:
I also performed the same process in Geneious - created a BLAST database then performed a BLAST search of the sequences used to create the database against that database and it worked just fine.
Hi @laura.d,
Thanks for posting examples of your reference data! Allows me to very quickly diagnose the issue: consensus taxonomic assignment is failing because your reference taxonomy consists of only one level, and so when a query sequence hits two distinct reference taxa the result is "unassigned" — this classifier is not simply using BLAST, it follows the BLAST alignment with a consensus assignment step in q2-feature-classifier. See this topic for more details on this diagnosis and some potential fixes:
So you have two fixes:
Fix the reference taxonomy to contain multiple levels, as described in that topic
Maybe you don't want to have multiple levels? Maybe you cannot? You could always set --p-maxaccepts 1 so that BLAST just returns the top hit... not what I recommend but just clarifying that's how to do that if that's what you are after.
I personally recommend the classify-consensus-vsearch in q2-feature-classifier instead of BLAST... among other reasons it has a top hits only option that could be put to good use with your custom database.
Probably because Geneious is performing top-hit BLAST instead of the consensus step that q2-feature-classifier has implemented.