how is consensus calculated in qiime feature-classifier classify-consensus-blast

I am using qiime2 2022.8
How is the "consensus" calculated in qiime feature-classifier classify-consensus-blast? It is crucial to me for understanding how to parametrize this tool.

I agree!

Let's start here, with the documentation for classify-consensus-blast.

Performs BLAST+ local
alignment between query and reference_reads, then assigns consensus taxonomy
to each query sequence from among maxaccepts hits, min_consensus of which
share that taxonomic assignment. Note that maxaccepts selects the first N
hits with > perc_identity similarity to query

--p-min-consensus NUMBER Range(0.5, 1.0, inclusive_start=False,
inclusive_end=True) Minimum fraction of assignments must match top hit
to be accepted as consensus assignment.
[default: 0.51]

For even more details and technical notes on the implementation, here's the code.

Thanks for the answer. I think that apart of providing the code, the process of the consensus is not clear to me. For each esv there is a taxonomic assignment and a value. What is this value exactly?
Moreover, if I set a perc identity of 98 which yields:
id A perc 98.3, id A perc 98.4, id B 99.9.
Will I get id B because higher id perc or A because of majority of cases?
I also looked at the citation given by --citation but there is nothing about consensus there (=wrong citation)...

The citation for consensus is part of the feature classifier:
qiime feature-classifier --citations

All three of those will be accepted because they are over your threshold, and the taxonomy level that agrees >51% of the time across those three will be selected.

1 Like

Thanks. And the consensus value given by the function, how is it calculated?

Thank you for your patience.

It's the fraction of hits that all list the same taxonomy result.

Paraphrased from this section of the paper:

A consensus taxonomy is then assigned by determining the taxonomic lineage on which at least min_consensus of the aligned sequences agree. This consensus taxonomy is truncated at the taxonomic level at which less than min_consensus of taxonomies agree.

For example, if a query sequence is classified with maxaccepts = 3 and gives these his:
f__Lactobacillaceae; g__Lactobacillus; s__brevis.
f__Lactobacillaceae; g__Lactobacillus; s__brevis.
f__Lactobacillaceae; g__Lactobacillus; s__delbrueckii.
The taxonomy label assigned with min_consensus = 0.51 would be
f__Lactobacillaceae; g__Lactobacillus; s__brevis.
However, if min_consensus = 0.99, the taxonomy would be
f__Lactobacillaceae; g__Lactobacillus.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.