Hello,

I examine the significance of differences in taxa composition between two groups using the ANCOM package.

I would like to make a bar plot for p-values calculated by ANCOM.

I know that the current ANCOM does not provide p-values, so I am wondering if I could use the mean of p-values across sub-nullhypotheses as well as calculating the W value by counting the number of rejecting sub-nullhypotheses.

Is it reasonable? Could anyone give me suggestions?

Hey @ohmiyajohn,

Whatâ€™s your motivation in using p-values as opposed to the W statistic, itâ€™s self?

Best,

Justine

Hello Justine,

When illustrating results of the significance test in our paper, it is hard a little bit to understand plots for the W value instead of p-values because the W value is specific to ANCOM.

To facilitate understanding the result for readers and reviewers, I would like to use p-values.

Actually, some people are asking on this site what the W value means.

Best regards,

Ohmiyajohn

Hi @ohmiyajohn,

Fair point. Iâ€™m on a one-woman crusade to joust at the windmill of â€ślarger p-value means larger effect sizeâ€ť. And just for better effect size measurements in general in microbiome data.

There is functionally no p-value for ANCOM. The W statistic essentially measures how many of the feature ratios are significantly different a the p-threshhold. So, if A/B is significantly different between your groups, but A/C â€¦ A/Z are not, then A gets a W statistic of 1. If B/C â€¦ B/Z is significantly different, B gets a W score of 25, and so-on. At the end, the test sums the significant ratios. But, youâ€™re back to the problem that youâ€™re working with the p-threshhold verses an actual p-value for the taxa test set. Which gets back to the fact that the W statistic is a more accurate representation of the probability of finding a significant difference in abundance between the two groups.

At one point, Shyamal Peddada, the last author of the original paper, said he was at least considering a pseudo p-value. But, this was maybe 2 or 3 years ago, and Im not sure about the status of the project vs expanding ANCOM to other tests.

Best,

Justine

Although I mentioned that I could calculate mean of p-values in the first post, the median of p-values could be better than the mean if I use the p-values for the plot.

Hello Justine,

Thank you for your reply and straightforward explanation about the W and p-value in ANCOM.

I know that the W score is the best to evaluate the significance of difference between two groups, and ANCOM calculates multiple p-values for sub null hypotheses per one comparison.

I can modify the ANCOM code to calculate median or mean of p-values within the comparison, so my question is whether the median of p-values could be a pseudo p-value to depict our result.

Although, right now I am going to make two plots for the W statistics and median of p-values, I guess that my wet colleagues prefer that of p-values.

Bests,

Ohmiyajohn

Hi ohmiyajohn,

I think to modify the code youâ€™d have to go into the original python repo in scikit-bio. That said, i think youâ€™re probably better off with a median p-value than a mean, since it will be more resilient to extremes.

However, I think the use of p-values obfuscates the information from the test and peopleâ€™s comfort with the value isnâ€™t a good enough reason to coerce the data that way. A lot of people are uncomfortable with the idea that a permutative p-value is limited by the number of permutations because of the way itâ€™s calculated, but shouldnâ€™t be a good motivation to repeat the test with increasingly large *n*s, rather than calculating an effect size. (Again, apologies, a quixotic one woman quest for effect sizes ).

Best,

Justine

Hi @ohmiyajohn,

Of possible interest, I just noticed through the ANCOM2 (R package only) documentation that they let you set a significance value (default 0.05) so it might be that in ANCOM2 you would be able to extract p values from your output. I havenâ€™t tried personally to do this but might be worth a try?

Hello Mehrbod Estaki,

Thank you for your suggestion.

Unfortunately, ANCOM2 does not seem to output p-values and only output the W values and taxa with significant difference.

The W values appear to be the number of rejected sub hypotheses, so I wonder if the W values divided by the number of all sub hypotheses could be used instead of p-values.

Best regards,

Ohmiyajohn