ANCOM (differentially abundant)


Question about ANOM and differentially abundant taxa.
How should I decide that X taxa is differentially abundant in one group more than the other group (I have two groups), based on the ANCOM percentile abundance table?

Should I look at the number of sequences under the 100 percentile of each group, and call the differentially abundant taxa is where the higher number of sequences is?

What is the correct/ suggested way to judge differentially abundant taxa from the ANCOM percentile abundance table?

Thank you for all the help!


Let's take a look at an example from the "Moving Pictures" tutorial: QIIME 2 View.
The goal of every statistical test is to estimate differences between distributions. In a minority of cases, test statistic or p-value might be biased due to different distribution properties (ex. extremely skewed dist) or failure to meet test assumptions. The percentiles show you how the distribution of features looks in every group, thus they should provide a basis for an estimation.

The easy (but not technically correct) "eyeball" method is to observe whether 
the 95% confidence intervals overlap for the two items. If two items reflect 
a statistically significant difference using this "eyeball" test, then they 
will [most likely] also pass the more rigorous tests.

Test statistic and p-value are complementary.



Thank you for getting back to me.

I am still not sure how to decide on differentially abundant taxa; the distributions in the groups are quite spread out and it is not clear how to pinpoint the differentially abundant taxa.
Please see my example data here ancom_Genus-Age-in-months_uniform.qzv (449.1 KB).

Do you think I should look at the median (50th percentile) of every group and call it differentially abundant if the number of sequences is higher than in other groups?

I appreciate all the help!


it's a good question. I'd start with plotting the distributions for each time point and comparing them visually (they do look wide indeed). What also would be important is a sample size at each time point, so I'd suggest adding it to the plot as well.

If you have time series (samples taken from the same objects) take a look at methods in Performing longitudinal and paired sample comparisons with q2-longitudinal — QIIME 2 2022.8.3 documentation. ANCOM assumes independency of samples, but time-series are not independent.


Hi Valentyn,

I appreciate your help!

All these numbers in the abundance table (ANCOM output) come from the samples that we fed into the system. If we reported the sample size for each group in a previous step (e.g. alpha diversity), why would it be necessary to consider the sample size when plotting the distribution of sequences at each time point?

My understanding is that plotting the distribution will help me visualize how these sequences are distributed among the groups for that particular taxa. But, if the distribution of X taxa is skewed to the right in one group while skewed to the left in another group and so on, how can we judge differentially abundant taxa between the groups?

Another thing, how important is it to be consistent when we build our judgement of differentially abundant taxa?

What I mean by that is, let's say for the X taxa we say it is differentially abundant in Group A based on 50th percentile, and for the Y taxa we say it is differentially abundant in Group B based on 100 percentile.

It sounds logical to call differentially abundant taxa based on the 50th percentile, doesn't it?

Note: our data is not longitudinal data.

  • pardon, not sample size, but a nr of samples with bacteria present in it (so the number of observations). If you evaluate 2 observations the distribution is much less meaningful than when you evaluate say 10-20. Then, even if the difference is statistically significant, it is somewhat spurious.
  • Diferentially abundant = overrepresented. Thus left skewed would be DA-species. I don't fully understand the second question, so please rephrase it, if you will.
  • Consistency in DA analysis is a good question, read more on it here: Microbiome differential abundance methods produce different results across 38 datasets | Nature Communications
  • I tend to think about stats in distribution terms, rather than point estimates. Comparing medians is meaningful, when the medians represent the potential distribution well (so when the number of observations is low, it might not be the case).


1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.