statistical tests for alpha diversity?

Hello

After reading through posts #2211 #6923 #14955 #19427 #3859 #3336
I wonder if you could please share with me some insights on why Kruskal–Wallis at first and then pairwise is to be used?
and not ANOVA then Tukey HSD
After rarefaction, the samples mean or median will be similar, so how to decide whether to use parametric or non-parametric tests? from your perspective

the sample size I ended using is 167 (after rarefaction)

qiime diversity alpha-rarefaction \
  --i-table dada2-table.qza \
  --i-phylogeny fasttree-rooted-tree.qza \
  --p-min-depth 6 \
  --p-max-depth 250000 \
  --m-metadata-file metadatasheet.tsv \
  --o-visualization alpha-rarefaction.qzv

qiime diversity core-metrics-phylogenetic \
  --i-phylogeny fasttree-rooted-tree.qza \
  --i-table dada2-table.qza \
  --p-sampling-depth 1550 \
  --m-metadata-file metadatasheet.tsv \
  --output-dir core-metrics-results


qiime diversity alpha-group-significance \
  --i-alpha-diversity core-metrics-results/shannon_vector.qza \
  --m-metadata-file metadatasheet.tsv \
  --o-visualization core-metrics-results/shannon_vector.qzv

the qzv to have a look:
shannon_vector.qzv (354.5 KB)
Please let me know if it is required to share any more info/uploads.

Thanks very much
Marwa

1 Like

Hi @MarwaTawfik,

As a quick reminder, ANOVA and Tukey HSD assume the data is normally distributed. Kruskal wallis doesn't make assumptions about the distribution. For some metrics, (my experience has been observed features and Faith's PD), the distribution is asymptotically normal. In other metrics (simpson, pielou), the distribution is very much not normal: both of these metrics vary between 0 and 1! So, without knowing what metric is going in, kruskal wallis is a better test.

You can run an anova (technically a linear regression) in :qiime2: using the q2-longitidinal plugin. But, this again, is dependent on the assumption that your data is asymptotically normal enough.

Best,
Justine

5 Likes

Thanks very much, Justine @jwdebelius
This is helpful.
Another point I would like to add is that can't decide before running both tests as one of the problems think is that the dataset is rarified and so you can't decide if you will go for the mean or the median as the one which accurately represents the centre of distribution most so that you can choose whether to use parametric or non-parametric tests.
Your comments are always appreciated.
Cheers
Marwa

Hi @MarwaTawfik,

I'm not sure I understand the problem? The distribution is a function of the metric at a semi fixed depth. You can absolutely predict a priori whether a metric will be non-normal (although its hard to predict the normal ones). Again, my close to a decade of experience has been that richness metrics have asymptotically normal behavior after rarefaction - so much so that I will sometimes z-normalize the metric before analysis.

Best,
Justine

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.