Hi,

I would like to ask for a clarification for beta-group-significance results. I am comparing samples based on their origin. I evaluated weighted UniFrac beta-diversity metrics, and I plotted a PCoA. Based on that, and on taxonomic classification, I could clearly see samples from one origin being very different to samples from other two origins. Then I wanted to confirm this running a PERMANOVA test. Although the box and whisker plots confirm the results I got with the PCoA, I get all p-values and q-values equal to 0.001. Is this a bug or there might be a reason why all pairwise comparisons have the same value? Pseudo-F values reflect my impressions, thatâ€™s an additional reason why I am asking if this is a bug. I read of similar situations on the forum, but Iâ€™d like to ask for an explanation, as I am not a statistics expert. Running PERMDISP instead gave results more similar to what I was expecting. Thank you very much.

# Understanding beta-group-significance PERMANOVA results

HI @MaestSi,

Permanova is a permutative test. Youâ€™re asking whether the data is more extreme that a random rearrangement of the data (which lets you escape some of the assumptions of a tradition test statitistics). The limitation here is that the p-value is then the observed probability of finding something more extreme. For the sake of having a p-value that is not 0, we actually calculate this as \frac{n_{\textrm{more extreme}} + 1}{n_{\textrm{permutations}}}.

So, if I have 999 permutations and nothing I can find is more extreme, then my p-value is \frac{0 + 1}{1 + 999} = 0.001. You can decrease the p-value by adjusting the number of permtuations (for example, a the minimum p-value from 9999 permutations is 0.0001. However, there is some sense that this is its own kind of p-hacking. Itâ€™s also why, when you report a permutative p-value, you need to report the number of permutations you performed. p=0.01 for 9999 permutations is *very* different from p=0.01 for 99 permutations.

Best,

Justine

Thanks @jwdebelius for the kind explanation. Still, itâ€™s counterintuitive to me why there is nothing â€śmore extremeâ€ť I can find out of 999 permutations, as looking at the pairwise distances (and the corresponding box and whiskers plots) I canâ€™t see the pattern I would expect based on the p-value. Moreover, the pseudo-F measure for pairwise comparisons are respectively 264, 6 and 280. Possibly this is not the right statistical test for my data?

Hi @MaestSi,

I think, first, you need to de-couple your expectation of the relationship between a p-value and an effect size. This is an extreme case of the old statisticanâ€™s refrain. The lower limit of the p-value here does not correlate to the effect size. Period.

In a traditional statistical test, you calculate an effect size, and then compare it agains ta known distribution, which is when when your F is larger (more extreme), you p-value gets smaller.

Here, because youâ€™re data is non-independent in a big way and therefore violates a major assumption of most traditional tests, you canâ€™t compare against the known distribution, and so you fashion your own by shuffling (permuting) the group labels to see what the probability is your differnece could happen at random. There isnâ€™t a direct relaitonship between the effect size statistic and observed p-value.

You probably (hopefully) have a large sample size, and a correspondingly large range of values and a small difference in the means.

If its a distance matrix where you want to compare the distribution of distances, this kind of test (whether you use the permanova or adonis implementation) is the field standard. Using the kind of test you seem to want to use for this data would be inappropriate, as youâ€™d break the independence assumption.

Best,

Justine

Yes, quite a large one, as I have 280 samples in total, with small difference in the means; I wouldnâ€™t say the range is very big, a part from some outliers.

I was just thinking about using PERMDISP instead of PERMANOVA, which should be non-parametric too. Based on that test the p-values seem to be more coherent with my expectations. In your opinion it would be better to stick to PERMANOVA? Thanks for all your time.

Simone

You want to combine permanova and permdisp, since they test different hypothesis. Permanova asks if there is a difference in either within or between group distance for any of my groups. Permdisp tests the hypothesis that there is a significant difference in within group variance.

So, my hope is that I see a large signal in permanova and a small one in permdisp, because that suggests that my difference is driven by differences between communities compared to differences within one of my communities.

Best,

Justine