ANCOM, testing assumptions, filtering

To add to @Mehrbod_Estaki’s links, this is part of the problem when dealing with these sorts of data types. You can’t really evaluate the resulting test statistic to see if the statistical test is appropriate. This ultimately boils down to the number of actual dimensions that we can actually observe. If there are D species, we can only observe D-1 dimensions, since we are dealing with data of proportions.

It boils down to how proportions work. If you have variables x_1 + x_2 + x_3 = 1. And you happen to know the quantities for x_1 and x_2, you have already figured out the quantity of x_3 because you already know that they add up to 1. Hence you can solve all 3 of these variables only knowing two of the variables (hence it is two dimensional).

The plot thickens when you have multiple measurements for x_1, x_2 and x_3. And you are trying to figure out which of the underlying quantities actually changed across the samples. It turns out that this is actually not possible due to compositionality (which is also known as identifiabilty). The only think that we can do is make simplifying assumptions to obtain a guess. Rather or not that this assumption holds is beyond what we can do at the moment.

Concerning how ANCOM deals with multiple groups is just defining how the subhypothesis test is defined. As explained here you can define the hypothesis to be the difference between means. If you instead use an ANOVA to handle multiple groups, you are testing for the hypothesis that all of the groups have the same mean (here is a fairly intuitive explanation behind ANOVA)

5 Likes