Custom BLAST database fails to assign taxonomy

Hi,

I'm working with a fungus that has no sequence info in Genbank. I've used sanger sequence to generate sequence information from a number of cultures and used those sequences to generate a custom BLAST database.

To check if the BLAST database could assign taxonomy, I imported the same sequences used to create my BLAST database into QIIME (qiime2-2019.4) and assigned taxonomy:

qiime feature-classifier classify-consensus-blast \
  --i-query /storage4/DAVIEL20/pachy-test/MDC_pachy_seqs.qza \
  --i-reference-reads /home/DAVIEL20/Data/References/Misc/Pachy_kelly_seqs/Pachy_and_off-targets_seqs.qza \
  --i-reference-taxonomy /home/DAVIEL20/Data/References/Misc/Pachy_kelly_seqs/Pachy_and_off-targets_taxonomy.qza \
  --o-classification /storage4/DAVIEL20/pachy-test/Pachy_and_off-targets_blast-results.qza

When I exported a taxonomy.tsv table from the BLAST results, most of the sequences were unassigned:

To check my reference taxonomy file was correct I unzipped Pachy_and_off-targets_taxonomy.qza and checked the taxonomy.tsv file within the data folder. It looks as expected:

I also performed the same process in Geneious - created a BLAST database then performed a BLAST search of the sequences used to create the database against that database and it worked just fine.

Any suggestions would be appreciated!

Cheers
Laura

Hi @laura.d,
Thanks for posting examples of your reference data! Allows me to very quickly diagnose the issue: consensus taxonomic assignment is failing because your reference taxonomy consists of only one level, and so when a query sequence hits two distinct reference taxa the result is "unassigned" — this classifier is not simply using BLAST, it follows the BLAST alignment with a consensus assignment step in q2-feature-classifier. See this topic for more details on this diagnosis and some potential fixes:

So you have two fixes:

  1. Fix the reference taxonomy to contain multiple levels, as described in that topic
  2. Maybe you don't want to have multiple levels? Maybe you cannot? You could always set --p-maxaccepts 1 so that BLAST just returns the top hit... not what I recommend but just clarifying that's how to do that if that's what you are after.

I personally recommend the classify-consensus-vsearch in q2-feature-classifier instead of BLAST... among other reasons it has a top hits only option that could be put to good use with your custom database.

Probably because Geneious is performing top-hit BLAST instead of the consensus step that q2-feature-classifier has implemented.

Give that a spin and let me know how it goes!

1 Like

Perfect! Thanks @Nicholas_Bokulich, you are the font of all QIIME knowledge :exploding_head:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.