Hi, I was wondering whether you can recommend a way to test the ANCOM assumption that most taxa are shared/not-changing between groups? I was thinking something like the qiime1 shared_phylotypes. I am assuming that there is a better chance of meeting the assumption at collapsed taxa levels vs sequence variants. Also, do you recommend filtering rare taxa/features before running ANCOM to help meet assumptions?

Great question! I’m also curious as to how to test the assumptions for ANCOM (i.e less than 25% of features changing between groups). An extension to the question, how does this exactly work across multiple groups? 25% change across pairs or overall?

@jessicalmetcalf, from this thread it sounds like too many zeros or low counts are better off removed:

Low counts / zeros could cause false positives in ANCOM

To add to @Mehrbod_Estaki’s links, this is part of the problem when dealing with these sorts of data types. You can’t really evaluate the resulting test statistic to see if the statistical test is appropriate. This ultimately boils down to the number of actual dimensions that we can actually observe. If there are D species, we can only observe D-1 dimensions, since we are dealing with data of proportions.

It boils down to how proportions work. If you have variables x_1 + x_2 + x_3 = 1. And you happen to know the quantities for x_1 and x_2, you have already figured out the quantity of x_3 because you already know that they add up to 1. Hence you can solve all 3 of these variables only knowing two of the variables (hence it is two dimensional).

The plot thickens when you have multiple measurements for x_1, x_2 and x_3. And you are trying to figure out which of the underlying quantities actually changed across the samples. It turns out that this is actually not possible due to compositionality (which is also known as identifiabilty). The only think that we can do is make simplifying assumptions to obtain a guess. Rather or not that this assumption holds is beyond what we can do at the moment.

Concerning how ANCOM deals with multiple groups is just defining how the subhypothesis test is defined. As explained here you can define the hypothesis to be the difference between means. If you instead use an ANOVA to handle multiple groups, you are testing for the hypothesis that all of the groups have the same mean (here is a fairly intuitive explanation behind ANOVA)