incomplete taxonomic assignment

SoilRotifer · July 15, 2021, 2:08pm

I am assuming you are running with default settings against using either SILVA or Greengenes reference database?

A few things to note:

You could be observing a limitation of the reference database being used (i.e. SILVA / Greengenes).
Be wary of how BLAST hits are displayed on NCBI. That is, equivalent BLAST hits are arbitrarily sorted, and if you scroll down far enough you may find that there is an identical "hit" to a very different organism.
Given 2, this is why we have classify-consensus-blast and classify-consensus-vsearch. Any hits which cannot be taxonomically resolved have their taxonomy truncated to the last common ancestor. This also applies to classify-sklearn too.

More information can be found here:

You can also try your hand at using RESCRIPt to make your own reference database for classifying your sequences:

This is the limitation of assigning taxonomy using short reads. However, you can use tools like q2-clawback to help improve things:

-Mike