weighted vs unweighted unifrac?

Hello! I am exploring the use of PCoA Unifrac plots in Qiime2 with some 16s amplicon data from fecal samples of different species

While researching a bit about this I have come across a few people who say to use unweighted Unifrac plots instead of weighted to compare sample groups if you are looking to find a biomarker. Would anyone have some insight as to why?

My understanding is that weighted Unifrac plots consider relative abundance of taxa shared among samples, while unweighted only considers presence/absence of taxa shared between samples. So let's say you compare fecal samples from two different species of bird using a weighted and unweighted Unifrac, wouldn't you want to take into account relative abundance when taking a biomarker approach?

1 Like

Hi @Johanna_Lisa_Bosch,

UniFrac distance (beta diversity in general) is a way to compare the whole community and ask, at a high level, "Is there a difference in my groups". A metric like UniFrac won't tell you what's driving that distance, only that there is one.

By focusing on abundance, you tend to weight the difference by abundance: are the most abundant (or the least related) organisms causing changes in your community.
Unweighted UniFrac focuses on presence and absence, and therefore emphasizes the less abundant organisms, which might be important.

So, I guess a piece of this is what your hypothesis is around the organisms you expect to be different: do you think it will be the most abundant features, or will it vary across abundance.

My personal recommendation is to run both, since they'll give you different insights into your community. They can help you triangulate differential abundance. I've had projects where we had different results in unweighted and weighted UniFrac distance because different exposures were responsible for changes in different aspects of my community. Knowing that helped me build a better analysis because then I could look at their impacts on differential abundance.
(I can't remember the last paper I published on new data that only used one beta diversity metrics. Now, which goes in the main text... that's a whole other discussion.)

If you're directly interested in taxa and want to use a community level description to support that, you might also look into DEICODE, either as a compliment to or in liu of your classic diversity metrics.


PS I am itching to talk about why "biomarker discovery" might not be the best first goal for a microbiome analysis, but we'll leave that aside for the day