ANCOM results interpretation


I have used ANCOM to detect species with significant changes across my samples. However, I want to know in which group of samples (I have 4 groups based on diet+treatment) do the abundances of the detected species changes significantly. I have 5-number summary of the abundances which I used to create boxplots in R to better visualize the changes, but since the changes were very high, the plot did not come out very nicely. Is there a statistical test to detect which group differs from the others?

1 Like

Hi @Parix,
Please see here:

Run ANCOM on each pair of groups.

Good luck!

1 Like

Thanks for your reply.
I could run ANCOM in pairwise manner, however, I don’t know how to deal with multiple testing in that case.

1 Like

Yeah I am not sure either how to deal with multiple testing in that case.

Here’s another relevant topic, but it looks like we came to the same conclusion. @Mehrbod_Estaki did the ANCOM authors tell you how to handle pairwise tests?

1 Like

Unfortunately the pairwise testing is not something any differential abundance tool I know of deals with (except 1, see below). I don’t know why, I can’t tell if its just something the developers don’t care about, an oversight, or it is a unique problem that hasn’t been solved. My guess is that it is not the latter. Regardless, with tests that calculate p-values, this can be done in R with p.adjust, but it does require some custom coding. With tests like ANCOM that produce something else like W score, I haven’t found any easy solution.
I once complained about the lack of pairwise testing on twitter, and the developer of the propr R package, Thom Quinn, made one because of it, so this is the only tool I know of at the moment that does this. Though propr does things a bit differently, so you’ll want to look through the readings in that page first. Anyways

Here’s an example code with the iris dataset.

priris <- propd(iris[,1:4], iris[,5]) #replace with your feature table and grouping column
priris2 <- updateF(priris)    #calculates p-values for ANOVA to set up for post-hoc testing
phiris <- posthoc(priris2)      # runs post-hoc testing of all pair-wise comparisons
pwiris <- getResults(phiris) 
sigiris <- pwiris[pwiris$`versicolor.vs.setosa.adj` <0.05,] #replace with your comparisons of choice

From going through the code and according to the paper, it seems that ANCOM essentially runs an ANOVA/Kruskal-Wallis/wilcoxon test (depending on if you inputted an adjusted formula and how many treatment groups you have) after transforming the OTU table to relative abundances, taking the log ratio from comparing each OTU/ASV with every other ASV in the table across samples, calculating the mean log ratio of those values within each treatment group, and, finally, running the statistical test (ANOVA, etc.) on those mean log ratio values for each ASV comparison. Because one ASV will be compared with all other ASVs in the table, there will be many ANOVA tests, so the W value is how many times the null hypothesis from these tests is rejected per ASV.

All to say that, if the test runs an ANOVA, you should be able to run a Tukey test after each ANOVA. You would have to adjust the code (you can find it here, with a link to it from the author’s new personal webpage, which I’m adding because many links in previous threads about ANCOM are broken) and the test would take longer to compute.

Alternatively and assuming you ran something like a PERMANOVA prior to this, if your PERMANOVA tests showed that there are only supported differences between one of your treatment groups and all the others, you could combine all the others into one group and compare it to the one that was significantly different, leaving you with two treatment groups and a larger dataset than if you had to cut everything into many pieces. You would also have a true reason to combine, because, if there is no difference between some of your treatment groups, there’s no point in running an ANCOM to see which ASVs the non-existent differences are based on, this will only lead to false positives, since, most likely, the test will always output something, even if it doesn’t actually make any sense. Doing this, you would also avoid multiple testing.

If none of that is applicable, I would boxplot the log(abundances+1) for each ASV you’re interested in instead of raw counts from the OTU table, since those will be hard to put on a nice, readable scale. I suggest the +1 because log(0) is undefined and those numbers might get omitted from your boxplots (depending on which program/code you’re using to generate the boxplots), skewing the results.

Good luck with everything!!!

1 Like