Question about consensus taxonomy assignment in classify-consensus-vsearch

How does classify-consensus-vsearch compute consensus taxonomy?

Hi everyone,

I am running qiime feature-classifier classify-consensus-vsearch in qiime2-amplicon-2025.4 (installed via Conda), with the following parameters: --p-perc-identity 0.97, --p-maxaccepts 5, --p-min-consensus 0.51 (default).

I have some questions about how the consensus taxonomy is calculated in certain cases. For example, for one ASV, the taxonomy of the top hits (all with identical identity values of 99.7%) is:

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Campylomma k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades

According to my understanding, the consensus taxonomy should be: k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades

However, the consensus taxonomy returned is: k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae

Could you clarify how the consensus is determined in this case? Is the truncation at the family level due to a missing genus-level annotation in one of the reference hits?

Thank you very much!

Hi @lacerta,

EDIT: 08-13-2025

Ignore what I wrote below... I somehow miscounted the number of g__Creontiades strings...

Give me a moment to reconsider ...

The correct consensus taxonomy would indeed be

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae

It would not be

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades

as it only appears in 2 of the 5 taxonomy strings (or 2 of the 4 at the genus level). Thus less than 51% minimum consensus. In fact, nothing at the genus level is greater than 51%, so, we back up one rank level and assess consensus again. At the family level all of the taxon strings contain

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae

thus this string is chosen as the consensus taxonomy.

You could play around with adjusting the --p-min-consensus and --p-perc-identity values. But note, although adjusting these to fix the current taxonomy example, has the potential make other assignments worse... or not...

Note: Not all reference sequences have a fully annotated taxonomy. If you'd only like to make use of sequences with full taxonomy, then you can use many of the builtin QIIME 2 tools, and even RESCRIPt tools, to filter and curate your reference database prior to using it for classification.

Hi @lacerta,

Can you share the output files from your vsearch classification. I'd like to investigate further.

Hi @SoilRotifer ,

Thank you so much for the quick reply and for taking a closer look at this. Would it be okay if I sent you the files via DM?

@lacerta, yep that'd be perfect!

Hi, it seems I can’t send you a DM — the ‘message’ button doesn’t appear when I click on your profile.

Hi everyone,

Following some investigation with @SoilRotifer, we were able to determine the cause of the unexpected truncation in the consensus taxonomy I reported earlier.

The issue is linked to missing taxonomic annotations at lower levels in the reference files. Returning to my previous example, the top hits for one ASV were:

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Campylomma
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades

Here, one hit lacked a genus annotation, so classify-consensus-vsearch returned:

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae

On the other hand, when we kept the empty placeholders for missing ranks in the reference:

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades;s_
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades;s_
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Campylomma;s_
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades;s_
k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g_;s_

the consensus matched the expected result:

k__Metazoa;p__Arthropoda;c__Insecta;o__Hemiptera;f__Miridae;g__Creontiades;s_

Conclusion: It is a good idea to always include all rank levels in the reference taxonomy, even if they are just empty placeholders (e.g., f__, g__, s_). This should prevent unexpected truncation in consensus taxonomic assignments.

Some RESCRIPt actions, such as get_silva_data and get-ncbi-data, include a flag called --p-rank-propagation, to help mitigate this issue, as explained in this part of the SILVA tutorial.

Additionally, RESCRIPt provides an edit-taxonomy action that allows batch modification of taxonomic labels, including inserting missing prefixes when needed.

Thanks so much to the QIIME 2 team for all your help and support!

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.