# ASVs driving the differences behind distance matrix

Hi All,

Thanks again for this program. Quick question: How do I extract/identify the ASVs that are driving the differences seen among sample type in a unifrac distance matrix?

Thatâ€™s a fantastic question @Biancabrown, and to cut to the chase: thereâ€™s no good answer.

(This is actually something Iâ€™m working on right now in the context of alpha diversity, but the idea is the same.)

There are a couple layers to the problem:

Assuming you mean differences among â€śsamples between sample typesâ€ť and not â€śsamples within a sample typeâ€ť, we could imagine partitioning the distance matrix into just the distances between sample types.

This isnâ€™t a NxN matrix anymore, as you have mutually exclusive ID-sets on either axis (as a sample canâ€™t belong to two different sample types [I hope]). This isnâ€™t exactly a QIIME 2 distance matrix anymore and there certainly arenâ€™t any actions which can do this (or types to represent it, not that it isnâ€™t a good idea, we just donâ€™t have anything for this).

In any case, supposing there was an easy way to partition, you are still stuck with the beta-diversity calculation which effectively collapses all of your ASVs between the two samples into a tidy little number. This is basically useless if your goal is to talk about ASVs.

To unpack that you would need to effectively â€śdestructureâ€ť the UniFrac calculation. One way that comes to mind is to start dropping ASVs and seeing which ones â€śdramaticallyâ€ť (for some definition of dramatic) changes your UniFrac value.

Itâ€™s also quite likely that no particular ASV â€śdramaticallyâ€ť changes the score and so itâ€™s some composite effect. In fact Iâ€™m almost certain this is what youâ€™ll see as UniFrac is sort of â€śstabilizedâ€ť by the phyologenetic tree. So the impact of any one ASV is unlikely to really change the outcome, unless this ASV was a wildly different outgroup from the rest of the tree.

Another approach might be to calculate the â€ścomponentsâ€ť of the UniFrac distance independently and attempt to see which parts of it are the largest. By that I mean really computing UniFrac yourself, but instead of completing the calculation, you could stop at â€śbranch lengths unique to sample-type Aâ€ť, â€śbranch lengths shared between bothâ€ť, and â€śbranch lengths unique to sample-type Bâ€ť. This also isnâ€™t ASVs, but it would at least give you a direction to look, e.g. is the behavior of UniFrac here dominated by shared features or differentiated features, and if the latter, from which sample type?

In summary, thereâ€™s really no way to do this at the moment, but maybe someday weâ€™ll have tools that can pick apart these population summaries so that we can tie them back to ASVs.

Final note:
Does anyone know of a tool capable of doing this? You might save me a whole lot of work if someoneâ€™s found a good way to do this already

2 Likes

You could try using biplots â€“ @yoshiki has done quite a bit of work on this.

5 Likes

In addition to what has already been mentioned, this paper might be of your interest. Briefly, this algorithm is capable of identifying the features responsible for driving the differences between groups of samples in the context of a UniFrac distance matrix.

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.