UniFrac beta-diversity on collapsed feature-tables (taxonomy) ?


I am fairly new to qiime2 (and microbiome analysis in general) and have gone through the moving picture tutorial without much trouble, however I have a few questions regarding the use of the feature table for diversity measures compared to using the collapsed tables (by taxonomy).

It makes more sense to me to merge together all features with the same taxonomic assignation (collapse) and then calculate the diversity metrics on those tables.

For instance I like the heatmaps we can get, and those make sense to me:

In this idea I went to calculate some simple alpha and beta diversity metrics on the feature tables collapsed at the genus level. Is this something "legal" or does it make absolutely no sense ?
In the case it makes some sense, how can I calculate diversity metrics that needs a phylogenetic tree such as the UniFrac distances ?

Thanks :slight_smile:

1 Like

Hi @jpabl,

In my view, working with collapsed data it may make sense but it is still important to work with ASVs. My main point is that to collapse taxonomy you rely on the accuracy of the taxonomic assignment step. So, if as ASVs is assigned with low accuracy or even erroneously assigned to a taxon you may crate collapsed data with mixed quality in it.
Another point is that, if you have ASVs with different abundance profiles across your groups but belonging to the same taxonomy, after you collapse you will look at that average abundance for all the ASVs for that taxonomy, and therefore you may loose valuable information.
For your question, I have no knowledge of a way to compute distances as UniFrac after collapsing count by taxonomy.

Hope is helpful, but keen to hear more opinion on this.
Best wishes,


Hi @jpabl,

Let me add on to @llenzi’s excellent answer.

I think what you’re posing makes total sense - in an ideal world. Sadly… we don’t like in an ideal world and taxonomy and phylogeny don’t always line up. I talked about it here a while ago (featuring :t_rex: and :chicken: emojis).

We also have a second issue: that taxonomy annotation isn’t perfect. (I’m going to send you off to read Nick’s excellent discussion here)

But, essentially the problem boils down to the fact that taxonomy ≠ phylogeny and you can’t collapse your data and have it map to your tree.

So, you can run non-phylogenetic metrics on your data, but phylogenetic metrics require a tree. Which needs OTU or ASV level resolution.



@llenzi @jwdebelius Thanks a lot for both of your answers and the recommended posts, those were really helpful !

I think I’m going to keep the collapsed tables for the plots but also improve my pipeline by exploring the ASV data directly. Also, I’ll start by reading the recommended paper: Microbiome datasets are compositional: And this is not optional