Taxonomic classification error

Asha1 · May 21, 2019, 2:20pm

Hello, All,

I have one query regarding taxonomic classification, I couldn't get species level classification for many OTU.

If I do blast against NCBI database means, I can able to see my expected fungus name. Identity was 99, 97 like that in Blast result. I tried many unite reference database like sh_refs_qiime_ver8_97_02.02.2019.fasta, sh_refs_qiime_ver8_99_02.02.2019.fasta and sh_refs_qiime_ver8_dynamic_s_02.02.2019.fasta. But nothing worked out.

After surfing google, I came across the RDP pipeline for taxonomic classification. This one gave species level classification, but the result was not good enough. Another major problem is none of the unite reference database has the reference sequence for organism like Pythium species. If they don't have a reference sequence for desired organism means, Could I make my own reference database of desired organism ?.

My last doubt is what can I do to get species level classification in qiime. Here I am attaching the snapshot of my result for your perusal, kindly have a look.

Could anyone give suggestion about how to rectify this problem?

colinbrislawn · May 21, 2019, 5:24pm

Hello Asha1

Welcome to the Qiime 2 forums! :qiime2:

You ask a really good question:

Unfortunately, getting species level classification for every amplicon is not possible.

One of the issues we face is that some species will have very different 18S genes, so you can totally classify them to the species level, while other species will have the exact same DNA in the region of the 18S gene you sequenced, so they all look the same based only on this amplicon. In this second case, getting a species level assignment would be impossible using the 18S gene alone.

If two different species have the exact same 18S gene, changing the database won't help you either.

I'm sorry if that's disappointing. Does that help answer your question?

Colin

Nicholas_Bokulich · May 21, 2019, 7:08pm

Hi @Asha1,
I just want to build on @colinbrislawn's excellent advice.

It looks like you are using ITS — which is more variable than 18S, but the idea still holds: you may not actually be able to differentiate species based on a short marker gene read if those species are too similar in that region.

That is the short answer for why QIIME 2 feature-classifier (and other taxonomy classifiers) often report incomplete taxonomic assignments: the sequence cannot be confidently classified to a deeper level (e.g., species).

This stands in stark contrast to what NCBI BLAST is doing:

Of course. Unlike feature-classifier, NCBI BLAST is not using any kind of confidence measure to determine whether other related species may be equally good (or nearly as good) hits. It just reports the hits, and their similarity values.

You may want to try adjusting the confidence parameter, or other parameters when training/classifying; see this article for guidelines on setting parameters for ITS sequence classification.

You may also want to try a different classifier; the blast- and vsearch-based classifiers may present a more familiar interface, with which you can choose how many hits to keep, minimum percent identity thresholds, minimum coverage, etc. See the article above for more details; this will use blast or vsearch for database searching, but then QIIME 2 performs a native LCA classification to find the consensus taxonomy among your top hits. In other words, this is a similar process to your NCBI BLAST search but QIIME 2 does the hard work of figuring out whether your top hit is the right hit, or whether the species cannot be truly distinguished from among several top hits.

That is a problem! (if you want to classify pythium species) and one reason why NCBI BLAST may be doing better.

Any classifier can only perform as well as the reference data you give it... if you are missing an important species, that's a problem.

absolutely. You could make a custom database and use it stand-alone, or add it to the unite database. Use the UNITE database as your guide for formatting your database.

See also this discussion; you may want to reach out to that forum user to see if he created a useful database, and/or team up to find a solution:

Good luck!