I'm hoping you can help clarify some ANCOM results. I've searched the forums and think I understand that the W statistic is a count of the number of times the null hypothesis (groups are equal) is rejected and suggests which features are significantly different between groups. I think I also understand the percentiles based on the following post Interpreting values from an ANCOM percentile abundance table - #10 by Lisa_Crummett.
I have some confusion when interpreting a two group comparison where the ANCOM has pulled a feature out as significantly different but there seems to be overlap between the percentiles for the two groups. I've included two examples below. The first example is a 16S analysis that includes all the reads in the dataset (~200k reads). The second example is a subsampled version of the same dataset (~100k reads). In both examples ANCOM suggests the phylum Firmicutes is significantly different between the Normal and Tumor groups but I'm confused when I look at the percentiles to determine which group (Normal vs Tumor) has a higher abundance of Firmicutes. In the first example, the tumor group has the highest count of Firmicutes sequences (100%) at 78163 but the Normal group has a higher median (50%) at 39913. In the second example, the normal group has higher 100% and 50% abundances but the the 100% abundances are similar between the Normal and Tumor groups. Based on these results, which group is ANCOM suggesting has a higher abundance of Firmicutes in the first and second examples?
In both cases, it appears that on average Firmicutes abundance is higher in the normal group, though it looks like you may have one or a few outliers in the tumor group that have abundances higher than any of the normal samples. If you were to sketch out a box and whisker plot where the top and bottom of the box were the 25 and 75th percentiles, and the top and bottom of the whiskers were the 0 and 100th percentiles, I think this would be clear.
Also, just FYI, we recommend using ANCOMBC now (qiime composition ancombc), as opposed to ANCOM. There is a nice visualization for ancombc (qiime composition da-barplot) that can help with this interpretation.
Thanks so much for your reply and the clarification! I'm working to update my analyses to use ANCOMBC but have a followup question. The QIIME 2 tutorials that included ANCOM recommended adding a pseudo count by creating a FeatureTable[Composition] artifact; however, ANCOMBC requires that "--i-table" be a FeatureTable[Frequency] artifact (the input for the FeatureTable[Composition] artifact). Do you recommend adding a pseudo count for ANCOMBC, or does ANCOMBC do this under-the-hood/ can tolerate frequencies of zero. I found one QIIME 2 forum post (link below) that suggests that a pseudo count may be added by ANCOMBC under-the-hood but any additional clarification is greatly appreciated.
Many thanks, Caroline
Great find on that forum post! @colinbrislawn hit the nail on the head - a pseudocount is added internally from the R code that we use to wrap the ANCOM-BC method in q2-composition. The link he shared with supplementary discussion on why pseudocounts are needed is also a good resource to look over, if you haven't already!