Taxonomic classification error

Nicholas_Bokulich · May 21, 2019, 7:08pm

Hi @Asha1,
I just want to build on @colinbrislawn's excellent advice.

It looks like you are using ITS — which is more variable than 18S, but the idea still holds: you may not actually be able to differentiate species based on a short marker gene read if those species are too similar in that region.

That is the short answer for why QIIME 2 feature-classifier (and other taxonomy classifiers) often report incomplete taxonomic assignments: the sequence cannot be confidently classified to a deeper level (e.g., species).

This stands in stark contrast to what NCBI BLAST is doing:

Of course. Unlike feature-classifier, NCBI BLAST is not using any kind of confidence measure to determine whether other related species may be equally good (or nearly as good) hits. It just reports the hits, and their similarity values.

You may want to try adjusting the confidence parameter, or other parameters when training/classifying; see this article for guidelines on setting parameters for ITS sequence classification.

You may also want to try a different classifier; the blast- and vsearch-based classifiers may present a more familiar interface, with which you can choose how many hits to keep, minimum percent identity thresholds, minimum coverage, etc. See the article above for more details; this will use blast or vsearch for database searching, but then QIIME 2 performs a native LCA classification to find the consensus taxonomy among your top hits. In other words, this is a similar process to your NCBI BLAST search but QIIME 2 does the hard work of figuring out whether your top hit is the right hit, or whether the species cannot be truly distinguished from among several top hits.

That is a problem! (if you want to classify pythium species) and one reason why NCBI BLAST may be doing better.

Any classifier can only perform as well as the reference data you give it... if you are missing an important species, that's a problem.

absolutely. You could make a custom database and use it stand-alone, or add it to the unite database. Use the UNITE database as your guide for formatting your database.

See also this discussion; you may want to reach out to that forum user to see if he created a useful database, and/or team up to find a solution:

Good luck!