Differential Abundance Methods & Quality

Hi @jwdebelius

Many thanks for your response, this is helpful. Really grateful :slight_smile:

Feel free to split this into a different query ticket as I am not sure.

Regarding the differential and relative abundance, I have seen some papers run kruskal wallis test (more than 2 groups) or Mann Whitney (2 groups) on the relative abundance data results and consider this as differential abundance while others use ANCOM for differential abundance. Do you think there is any difference and can we argue that both could be called differential abundance?

Kind regards

Hi @MarwaTawfik,

I did move ths to a new topic, because I think this is separate.

There's a pretty wealthy literature on this topic. I'd recommend looking at a few of these papers which address your question around MW/KW, ANCOM, and others.

I think the first comparison of ANCOM and KW was by Weiss et al

And then it was re-visited by Lin and Peddada in 2020:




Wow, Lin and Peddada in 2020 is a great find, partly because it covers almost all the common methods used:

Cumulative-Sum Scaling (CSS) implemented in metagenomeSeq, Median (MED) in DESeq2, Upper Quartile (UQ) and Trimmed Mean of M-values (TMM) in edgeR and Wrench, and Total-Sum Scaling (TSS) (relative abundance). ... “ELib-UQ” (Effective library size using UQ) and “ELib-TMM” (Effective library size using TMM)

ANCOM, ANCOM-BC, LEfSe, gneiss18, phylofactorization61,62, PhILR63, and selbal64

It's all in there!


Hi @jwdebelius
Thanks again.
According to this paper they mentioned running Kruskal–Wallis test on the rarefied samples for differential abundance. Have you heard or have you seen it run on unrarefied samples?

Hi @MarwaTawfik,

I think the point of this paper and much of the recent literature is that it is not appropriate for the data. It ignores some basic assumptions around the distribution and structure.

Are there situations in which you might structure your data in such a way that it could be passed into a test that assumes normality? Absolutely, tools like ANCOM, Aldex2, Gneiss, and Phylofactor are all built on those types of transforms.

There are certainly other papers that evaluate KW or other techniques on other transforms. I dont have the exhaustive literature of how everyone has compared all their methods.

I will, however, mention that I'm putting together slides for a class discussion about this topic, and I'll share my first slide:



Thank you so much Justine :slight_smile:

1 Like

Now I got a clue about it, sorry I missed your comment before and just now. Is that a different story for relative abundances comparisons? so I should log log transform my relative abundances to see if it had statistically changed between groups? from your experience do you run any stat test for the relative abundnaces comparisons
attaching here a quick figure

1 Like

@jwdebelius Hi Justine
It will be really appreciated if you could show that there is a difference between running Kruskal test for example on data before and after transformation, I can't see difference from my side.

Hi @MarwaTawfik,

The papers I linked above have already explored this difference pretty in depth, as has other compositional literature. Please review those papers for that test. It's possible the difference is large enough in your data that the transform doesn't make a difference. That doesn't mean it's unnecessary. Neglecting it will make your differential abundance results less valid to an informed reader or reviewer.

To your question o n the figure: I think stacked barplots are a bad way to display a comparison between groups, especially collapsed bar plots, and particularly barplots that rely on an HUSL color palette to communicate, so I'm not going to be a big help on "I want to test my taxa here". I tend to use these for diagnostics, run my differential abundance test (ANCOM, ANCOM-BC, Aldex2) and then make a volcano plot with a boxplot of transformed data, or run Songbird and make a rank plot with a boxplot of my ALR.



Thanks so much @jwdebelius
Or it might be because that I already transformed them into relative abundances, not sure!
This means that I need to make 2 transformations in my case (relative abudance and log transform).

1 Like

Hi @MarwaTawfik,

AFAIK, the difference between relative abundance is how you do your zero substitution.


1 Like

Hi @jwdebelius
As an update I ran wilcox test on absolute abdunances (with and without log transformation) and found that there were no significant changes between different groups whereas when I ran wilcox test on relative abundances (with and without log transfromation), significant stat differences were found for both log transfromated and not log transformed.
Would you like to share your thoughts?

Hello Marwa,

I've summarized the test you described in a table.

Does this look right?

test transform significant? valid method?
Wilcox (none) no
Wilcox log no
Wilcox relative yes
Wilcox relative log yes

Based on Lin and Peddada in 2020, can you tell me which of those tests are valid and which violate some assumptions of the test for this type of data?


Many Thanks for this. @colinbrislawn


based on Lin and Peddada in 2020, they mentioned using nonparametric tests (wilcox as example) on absolute abdunances as they said that they don't take relative abundances into account. So I am not sure why can't I ran it on relative abundances?
I was kindly adviced in this post to use log transformed data but I couldn't find any difference in my data in the resulted outputs (pvalues) from the non-transformed. I used pairwise_wilcox_test() for this.
Thank you for your comments in advance.

When you rescale your data based on relative abundance within each sample, that changes the ranks which changes the results of the Wilcoxon tests.

See this example using the airquality data set from base R:

With raw temperatures, it's clear that August is warmer than May in this data set. Once we have rescaled each Temperature to be a percent, the ranking changes. A similar thing is happening in your data set.

(This temperature data is pretty different than 16S data, but the idea is the same. Rescaling to relative abundance changes ranks which changes the test result.)

Let's zoom out! :telescope:

We can run any test on any numbers, R will not stop us.

It's up to us to make sure that our data matches the assumptions of the test so that we can trust our results.

Let's explore this next!


Many thanks again for this discussion @colinbrislawn
I ran lefse and I found similarities in the results when I compared to wilcox when I ran on relative abundances (as a way of normalisation of my data). With more reading on CLR (wilcox), I found that it does make log transformation also, which might be the reason why I am getting same outputs before and after log transformation (log10).
I am currently running ANCOMBC and will come back to you.