May I know either p-value or q-value should be referred when I performed multiple Kruskal-Wallis pairwise and Permanova pairwise tests? What is the rationale of using q-value but not the corrected p-value? Which correction is applied for both mentioned tests for p-value? In most cases, what is the threshold (0.05, 0.01, 0.001, 0.005) being applied in the microbiome study?

Hi @Benedict,
The q-value is the corrected p-value based on the BH-FDR multiple testing adjustment method (similar question here). Generally speaking, the q-value is what you should be referring to, unadjusted p-values have their place in exploratory studies, but they need to be approached with extreme caution. The standard 0.05 significant threshold is used to denote significance in the qiime2 versions of these tests.

I also want to mention that for permuation tests (permanova, permdisp, mantel, adonis, etc), there is a limit on the p-value based on the number of permutations you do.

The p-values there are calculated as \frac{num\_more\_extreme + 1}{num\_perm + 1}.

So, if you do 999 permutations and none of the randomly generated are better than your distribution, you’ll get a p-value of 0.001. You could try running more permutaitons - 9999 will give you at smallest a p-value of 0.0001. This means that you should report your permutations in your paper (p=0.01, 999 permutations) means somehting very different to me than (p=0.01, 99 permutations). It doesn’t, however, mean that you should increase the number of permutations to get smaller and smaller p-values because the p-value doesn’t always correlate witht the effect size, and an effect size measurement is often more useful.

I made a typo ! i meant to say (0.01, 999 permutations) vs (0.01 99 permutations). (I fixed the earlier post for posterity.)

A p-value of 0.01 iwth 999 permutations means 9/999 (10%) of the shuffled data were better than the original. A p-value of 0.01 with 99 permutations means taht out of 99 trials, we couldn’t find anything better. It could mean that if we ran more trials we could find some that were better and the p-value might hold, but it could also mean that if we run 99K trials, we can’t find anything better.