rescript merge-taxa secondary selection criteria

Is there an automatic secondary selection criteria for rescript's merge-taxa method that kicks in when the specified selection criteria is a tie between the two taxonomies?

E.g. when merging taxonomies using the "len" mode, if the same feature is classified to the same level in both taxonomies but the classifications don't agree, which taxonomy's classification is used? I would guess it's the first taxonomy listed, as in feature-table merge-taxa, but don't want to assume.

I would like to merge taxonomies created through two classification methods on the same dataset, but giving one of the classification methods preference when classification length is equal.

2 Likes

This is great question. Let's look at the code!

Under the hood, calling the merge-taxa function with mode 'len' call the _rank_length function as shown in these lines. So how does that function break ties?

As far as I can tell, in the event of a tie, the second option is chosen.

Interesting! :thinking:

Maybe @SoilRotifer can tell us more. Am I reading this code right?

2 Likes

Hi @smayne11 and @colinbrislawn,

There are pair-wise comparisons being made across the taxonomic lineages. You'll see that the code snippet linked above is actually called from here.There is no secondary selection criteria for the len mode. Whichever lineage makes it through those ties "wins". Pinging @Nicholas_Bokulich just to confirm. :ballot_box_with_check:

I really like this, as it ties in with my idea of leveraging an ensemble approach for taxonomic classification. :books:

3 Likes

Everyone is right:

Correct

i.e., when using len mode this action iterates across the taxonomies and compares each one to the current "winner".

So when there are only two taxonomies being compared, the second one given "wins" if there is a tie. If using 3 or more taxonomies, this gets a little more complicated, and order of precedence (for tiebreaking) is given in the reverse order that taxonomies are listed (so first gets lowest precedence, second beats first in a tie, third beats second in a tie, etc).

Exactly! That's what this action was originally written for, to allow different types of ensemble classifications, e.g., when combining resutls from different taxonomic classifiers.

So @smayne11 to get the functionality that you desire, you would use this action like so:

qiime rescript merge-taxa \
    --i-data second-best-taxonomy.qza your-favorite-taxonomy.qza \
    --o-merged-data consensus-taxonomy.qza

Good luck!

3 Likes