Mann-Whitney U test instead of Kruskal-Wallis for alpha- and beta-group-significance?

smreyes · February 28, 2018, 1:24pm

Dear QIIME 2 gurus,

Is it possible to conduct a Mann-Whitney U test instead of a Kruskal-Wallis test for alpha- and beta-group-significance? The Mann-Whitney U test is more appropriate for my purposes because I have paired samples (e.g., samples from the same participants were collected using 2 different methods on the same day, my objective is to compare the effect of collection method on bacterial communities in the samples). The Kruskal-Wallis test assumes independence but my samples are paired.

Note: I see that the Mann-Whitney U test is implemented in the q2-longitudinal plugin. However, my data are not longitudinal. I tried using the same variable for the group-column and group-state variables, but the plugin sends back an error message indicating they cannot be the same. Any thoughts?

Thanks!

Nicholas_Bokulich · February 28, 2018, 1:59pm

Hi @smreyes,

Yes! q2-longitudinal pairwise-differences or pairw-se-distances would be just the tool to use for this, and your data do not need to be longitudinal. You can do something like the following:

qiime longitudinal pairwise-differences \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-file shannon.qza \
  --p-metric shannon \
  --p-group-column Groups \
  --p-state-column collection-method \
  --p-state-1 method-A \
  --p-state-2 method-B \
  --p-individual-id-column studyid \
  --p-replicate-handling random \
  --o-visualization pairwise-differences.qzv

So your "collection method" metadata column would be used for the state-column and you just need to set a different metadata column for group-column. E.g., if you have soil samples from different locations that you are comparing with this, you could set group-column to the metadata column that lists this information (so that paired differences are compared both within and between groups). You may also be able to use a metadata column that contains a constant value across all samples (e.g., if all samples are soil, and have a "sample type" column) but I am not sure — if that causes an error let me know and I will raise an issue to get this fixed in the next release.

I hope that helps!

smreyes · February 28, 2018, 3:59pm

Thanks @Nicholas_Bokulich,

I tried, as you suggested, to have --p-group-column a constant variable (e.g., "sample type" which in my case is true as all my samples are human milk from a group of women living in the same area). However, I receive the following error:

Plugin error from longitudinal:

Need at least two groups in stats.kruskal()

I appreciate your help! In next release I would love to see Mann-Whitney U test as an option alpha- beta-group-significance and/or in q2-longitudinal. Any chance you know when that might be?

Thanks!

Nicholas_Bokulich · February 28, 2018, 4:18pm

Rats! I have raised this issue to get this feature added in the next release in q2-longitudinal.

Yes — the next release is due in April so I'm afraid it will be just a bit of a wait.

In the mean time, there is an easy option: in the vizualization, there is a link to download the paired results as a TSV. So set group-column to a column that does contain 2+ different values to run this command, download the TSV, and you can then import that file in R/your favorite stats software (maybe even excel can do this) to run a Mann-Whitney U test across ALL samples.

Of course that command could still be useful if, e.g., you did have a relevant group category (e.g., you wanted to stratify by sample type or patient demographic, then see if there are paired differences by collection method), but what we are currently lacking is a way to test this across ALL samples (and the issue, to be more specific, is that the Kruskal-Wallis test should really just run if more than one group exists in the dataset).

I hope that helps!

system · March 31, 2018, 10:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.