How to obtain database sequence id instead of taxonomic assignment in taxonomic.qza files

Dear all,
I was looking for a solution to my problem in the forum, but I didn't find anything similar.
I performed a taxonomic classification on a set of data, and I correctly obtained the taxonomy.qza table. I'd like to know if it's possible to substitute the taxon label with the ID of the best-matching reference sequence.
To be clearer, when I open the taxonomy.tsv file, the output I have is something like:

Feature ID Taxon Consensus
0a0000a00.. k__Bacteria...s__thermophilus. 0.9

what I'd like to obtain is:
Feature ID Taxon Consensus
0a0000a00.. 10000000 0.9

where "10000000" corresponds to the database sequence labeled as "k__Bacteria...s__thermophilus".

Thank you.

Welcome to the forum @vval!

This is possible, but only with some simple adjustments — because to clarify the classifiers are designed specifically to find the most likely taxonomic lineage (by consensus of multiple hits) rather than the top hit.

Two things:

  1. adjust your reference taxonomy so that the feature ID is repeated in both rows. A quick bash oneliner is one way to do this:
paste taxonomy.tsv taxonomy.tsv | cut -f 1,3 > new-taxonomy.tsv

Import to QIIME 2 (and adjust the header lines if there is a header line)

  1. classify using qiime feature-classifier classify-consensus-vsearch --p-maxhits 1

Good luck!

2 Likes

Thank you, it worked! I'm just curious, why is the option --p-maxhits 1? I tried to classify my reads without the option, but I obtained only "Unassigned".

You are forcing the classifier to only take the top hit, so it does not perform any type of consensus classification.

Because your "taxonomy" does not consist of a hierarchical taxonomy, it consists only of IDs. The consensus classification finds the common taxonomic lineage of the top N hits that match your query sequence, but this only works if you have a semicolon-delimited hierarchical taxonomy. In your case there is none so if you have more than 1 valid hit the result is "unassigned".

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.