Kruskal Wallis and Wilcoxon Tests for relative abundance of top genera?

MarwaTawfik · April 5, 2022, 7:13am

I have a question regarding the statistical test to be used to see if there is a stat difference between different groups (2 groups) at the Genus level.

Which stat test makes sense to be used in this case? the Wilcoxon or the Kruskal-Wallis test?
As far as I know, the Wilcoxon is for two groups while Kruskal-Wallis (KW) for more than two groups. But I can't figure out why I am getting a significant difference here using the KW test?

Also if you please give me some insights into the difference between them in the interpretation?
So I am basically using the top 10 genera and then trying to compare between groups if there is any stat difference at any of the genera (each genus separately).

compare_means(Abundance ~ dev.stage,
              group.by = "Genus",
              ps.prev0.gen1.rel.melt,
              p.adjust.method = "BH") 
# # A tibble: 10 × 9
#  Genus             .y.       group1          group2 p      p.adj p.format p.signif method  
#  <chr>             <chr>     <chr>           <chr>  <dbl>  <dbl> <chr>    <chr>    <chr>   
# 1 Flavobacterium    Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 2 Sphaerotilus      Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 3 Sediminibacterium Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 4 Polaromonas       Abundance stimulus intermediate 0.2     0.22 0.200    ns       Wilcoxon
# 5 Leptothrix        Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 6 Pseudorhodobacter Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 7 Rubrivivax        Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 8 Rhodobacter       Abundance stimulus intermediate 0.1     0.12 0.100    ns       Wilcoxon
# 9 Arcicella         Abundance stimulus intermediate 0.0636  0.12 0.064    ns       Wilcoxon
# 10 Rhodoferax        Abundance stimulus intermediate 1       1    1.000    ns       Wilcoxon

compare_means(Abundance ~ dev.stage,
              group.by = "Genus",
              method = "kruskal.test",
              ps.prev0.gen1.rel.melt,
              p.adjust.method = "BH")

# # A tibble: 10 × 7
#   Genus             .y.            p p.adj p.format p.signif method        
#   <chr>             <chr>      <dbl> <dbl> <chr>    <chr>    <chr>         
# 1 Flavobacterium    Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 2 Sphaerotilus      Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 3 Sediminibacterium Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 4 Polaromonas       Abundance 0.127  0.14  0.127    ns       Kruskal-Wallis
# 5 Leptothrix        Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 6 Pseudorhodobacter Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 7 Rubrivivax        Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 8 Rhodobacter       Abundance 0.0495 0.062 0.050    *        Kruskal-Wallis
# 9 Arcicella         Abundance 0.0369 0.062 0.037    *        Kruskal-Wallis
# 10 Rhodoferax        Abundance 0.827  0.83  0.827    ns       Kruskal-Wallis

Thanks very much,
Your inputs are always helpful moving forward.
Marwa

colinbrislawn · April 7, 2022, 1:17am

Hello Marwa,

For my own notes, here the docs for the compare_means() function from the ggpubr / rstatix Tidyverse package. Under the hood, this function calls the wilcox.test() and kruskal.test() functions from base R.

I think this question is best answered by a card-carrying statistician, which I am not. But let's start here:

The Wilcoxon test is intended to compared two groups. The Kruskal-Wallis test is intended compares multiple groups after the omnibus test rejects the null hypothesis that all groups have similar averages.

In your setup, you are comparing exactly two dev.stages within each genus. So why use the Kruskal-Wallis test at all?

And if you are using the post-hoc Kruskal-Wallis test, did the omnibus test show means to be different?

You are not the first person to run into this issue, see

and

I hope this is helpful. I think you should take this question to a real statistician.

Colin

system · May 8, 2022, 7:17am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.