ASVs driving the differences behind distance matrix

Hi All,

Thanks again for this program. Quick question: How do I extract/identify the ASVs that are driving the differences seen among sample type in a unifrac distance matrix?

That’s a fantastic question @Biancabrown, and to cut to the chase: there’s no good answer.

(This is actually something I’m working on right now in the context of alpha diversity, but the idea is the same.)

There are a couple layers to the problem:

Assuming you mean differences among “samples between sample types” and not “samples within a sample type”, we could imagine partitioning the distance matrix into just the distances between sample types.

This isn’t a NxN matrix anymore, as you have mutually exclusive ID-sets on either axis (as a sample can’t belong to two different sample types [I hope]). This isn’t exactly a QIIME 2 distance matrix anymore and there certainly aren’t any actions which can do this (or types to represent it, not that it isn’t a good idea, we just don’t have anything for this).

In any case, supposing there was an easy way to partition, you are still stuck with the beta-diversity calculation which effectively collapses all of your ASVs between the two samples into a tidy little number. This is basically useless if your goal is to talk about ASVs.

To unpack that you would need to effectively “destructure” the UniFrac calculation. One way that comes to mind is to start dropping ASVs and seeing which ones “dramatically” (for some definition of dramatic) changes your UniFrac value.

It’s also quite likely that no particular ASV “dramatically” changes the score and so it’s some composite effect. In fact I’m almost certain this is what you’ll see as UniFrac is sort of “stabilized” by the phyologenetic tree. So the impact of any one ASV is unlikely to really change the outcome, unless this ASV was a wildly different outgroup from the rest of the tree.

Another approach might be to calculate the “components” of the UniFrac distance independently and attempt to see which parts of it are the largest. By that I mean really computing UniFrac yourself, but instead of completing the calculation, you could stop at “branch lengths unique to sample-type A”, “branch lengths shared between both”, and “branch lengths unique to sample-type B”. This also isn’t ASVs, but it would at least give you a direction to look, e.g. is the behavior of UniFrac here dominated by shared features or differentiated features, and if the latter, from which sample type?

In summary, there’s really no way to do this at the moment, but maybe someday we’ll have tools that can pick apart these population summaries so that we can tie them back to ASVs.

Final note:
Does anyone know of a tool capable of doing this? You might save me a whole lot of work if someone’s found a good way to do this already :stuck_out_tongue:


You could try using biplots – @yoshiki has done quite a bit of work on this.


In addition to what has already been mentioned, this paper might be of your interest. Briefly, this algorithm is capable of identifying the features responsible for driving the differences between groups of samples in the context of a UniFrac distance matrix.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.