Interpreting ANCOM results

Hi QIIME 2 team!

I'm hoping you can help clarify some ANCOM results. I've searched the forums and think I understand that the W statistic is a count of the number of times the null hypothesis (groups are equal) is rejected and suggests which features are significantly different between groups. I think I also understand the percentiles based on the following post Interpreting values from an ANCOM percentile abundance table - #10 by Lisa_Crummett.

I have some confusion when interpreting a two group comparison where the ANCOM has pulled a feature out as significantly different but there seems to be overlap between the percentiles for the two groups. I've included two examples below. The first example is a 16S analysis that includes all the reads in the dataset (~200k reads). The second example is a subsampled version of the same dataset (~100k reads). In both examples ANCOM suggests the phylum Firmicutes is significantly different between the Normal and Tumor groups but I'm confused when I look at the percentiles to determine which group (Normal vs Tumor) has a higher abundance of Firmicutes. In the first example, the tumor group has the highest count of Firmicutes sequences (100%) at 78163 but the Normal group has a higher median (50%) at 39913. In the second example, the normal group has higher 100% and 50% abundances but the the 100% abundances are similar between the Normal and Tumor groups. Based on these results, which group is ANCOM suggesting has a higher abundance of Firmicutes in the first and second examples?

I'm running qiime2-amplicon-2023.9

Many thanks in advance for your assistance!

Hi @crw,
In both cases, it appears that on average Firmicutes abundance is higher in the normal group, though it looks like you may have one or a few outliers in the tumor group that have abundances higher than any of the normal samples. If you were to sketch out a box and whisker plot where the top and bottom of the box were the 25 and 75th percentiles, and the top and bottom of the whiskers were the 0 and 100th percentiles, I think this would be clear.

Also, just FYI, we recommend using ANCOMBC now (qiime composition ancombc), as opposed to ANCOM. There is a nice visualization for ancombc (qiime composition da-barplot) that can help with this interpretation.


Hi @gregcaporaso,

Thanks so much for your reply and the clarification! I'm working to update my analyses to use ANCOMBC but have a followup question. The QIIME 2 tutorials that included ANCOM recommended adding a pseudo count by creating a FeatureTable[Composition] artifact; however, ANCOMBC requires that "--i-table" be a FeatureTable[Frequency] artifact (the input for the FeatureTable[Composition] artifact). Do you recommend adding a pseudo count for ANCOMBC, or does ANCOMBC do this under-the-hood/ can tolerate frequencies of zero. I found one QIIME 2 forum post (link below) that suggests that a pseudo count may be added by ANCOMBC under-the-hood but any additional clarification is greatly appreciated.
Many thanks, Caroline

1 Like

Hi @crw! :wave:t3:

Great find on that forum post! @colinbrislawn hit the nail on the head - a pseudocount is added internally from the R code that we use to wrap the ANCOM-BC method in q2-composition. The link he shared with supplementary discussion on why pseudocounts are needed is also a good resource to look over, if you haven't already!

Cheers :lizard:


Great! Thanks so much for the quick reply and clarification, @lizgehret!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.