Kruskal-Wallis: 1-tailed or 2-tailed?

Hasti_A · March 17, 2024, 6:51pm

Hello!

I have a general question regarding the output data from Kruskal-Wallis pairwise comparison tests on microbiome data. Is this stats test, and the associated p-values generated, 1-tailed or 2-tailed? I have one reviewer who is curious.

Thanks, in advance!

colinbrislawn · March 17, 2024, 8:52pm

Hello Hasti_A,

Could you tell us the command you used? I can look up how the test is run based on that.

Thanks!

Hasti_A · March 17, 2024, 10:11pm

Hi Colin,

Sure! I'm focusing on just Shannon group significance, but this is the overall command I used:

> for metric in observed_features shannon pielou_e faith_pd
do
  qiime diversity alpha-group-significance \
    --i-alpha-diversity ${metric}_vector.qza \
    --m-metadata-file ../metadata_for_comb_abdomens.tsv \
    --o-visualization ${metric}_group_significance.qzv
done

colinbrislawn · March 17, 2024, 10:30pm

Thanks!

Here's the Qiime2 plugin code for making that visualizer:

github.com

qiime2/q2-diversity/blob/dev/q2_diversity/_alpha/_visualizer.py#L97


      
          for name, group in data.groupby(metadata_column.name):
              names.append('%s (n=%d)' % (name, len(group)))
              groups.append(list(group[metric_name]))
          
          escaped_column = quote(column)
          escaped_column = escaped_column.replace('/', '%2F')
          filename = 'column-%s.jsonp' % escaped_column
          filenames.append(filename)
          
          # perform Kruskal-Wallis across all groups
          kw_H_all, kw_p_all = scipy.stats.mstats.kruskalwallis(*groups)
          
          # perform pairwise Kruskal-Wallis across all pairs of groups and
          # correct for multiple comparisons
          kw_H_pairwise = []
          for i in range(len(names)):
              for j in range(i):
                  try:
                      H, p = scipy.stats.mstats.kruskalwallis(groups[i],
                                                              groups[j])
                      kw_H_pairwise.append([names[j], names[i], H, p])

And here are the docs for the kruskalwallis function:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html#scipy.stats.kruskal

The p-value for the test using the assumption that H has a chi square distribution. The p-value returned is the survival function of the chi square distribution evaluated at H.

As discussed in this GitHub issue, it's one-sided.

Bonus!

    kw_H_pairwise['q-value'] = multipletests(
        kw_H_pairwise['p-value'], method='fdr_bh')[1]

This means when you have multiple tests, the false discovery rate is controlled with the Benjamini & Hochberg (1995) method!

Hopefully this helps answer the ref's questions and gets the paper published!

Hasti_A · March 17, 2024, 10:37pm

Fantastic! This is SO helpful, thank you so much. Appreciate your help.

system · April 18, 2024, 4:38am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.