Is there anyone who used generalized linear model to compare the Alpha diversity?

SingeunOh · October 8, 2022, 1:37pm

Hi!

I am currently working with Qiime2 to figure out the Alpha diversity and Beta diversity.

I used the diversity alpha-group-significance plugin (for Shannon, Observed features)

I have 48 samples (animal stools) and want to compare according to the 2 categories of the samples.
(Season : Spring, Fall / Infection status : Yes, NO)

However, I have to solve the problem according to the message as below

"I highly recommend the use of linear modeling(LM) or generalized linear modeling(GLM) which is
commonly used in microbiome studies rather than non-linear model."

If anyone who encounter the same problem, would you recommend what linear model would be proper ? Can I solve the problem in Qiime2 directly ?

Or if you know any indirect method to solve the problem using of R or Python with the results of
qiime2, then It is also very welcomed !!

And surely, any recommendation or suggestion abour using linear model to compare the alpha diversity would be also warmly welcomed.

Thank you.

colinbrislawn · November 1, 2022, 9:53pm

I found a similar question on Biostars:
Using Generalized Linear Model in Microbial structure comparison.

The message is as below (From reviewer)
I highly recommend the use of linear modeling (LM) or generalized linear modeling (GLM) which is commonly used in microbiome studies rather than a Wilcoxon rank-sum test. This will allow you to better control for things that may impact your results such as the age animals.

EDIT: ~~Those look like categorical variables to me, so I'm not sure a linear model would help you much. Do you have any variables that are continuous, like animal age in that Biostars post?~~

I retract that. I am not a statistician.

Mehrbod_Estaki · November 2, 2022, 7:34am

Hi @SingeunOh,

Absolutely you can use linear models or GLMs with categorical data. In fact, the common ANOVA is essentially a special case of a linear model. This is especially pretty straightforward to do with alpha diversity because your response variable would be numeric and often follows a normal distribution.

Below is a typical workflow I do myself in R with alpha diversity that may be helpful for you. It assumes you have a table that has rows as sites (ie. samples, participants, etc.) and meta data in your columns (ex. season, infection, richness, etc.)

#required libraries
library(tidyverse)
library(sjPlot)
library(car)

#build basic model
mod1 <- lm(alpha ~ season + infection, data=mydata)

#run a marginal test Anova (Type II)
mod1 %>% car::Anova() #optional, add test.statistic="F" for F-statistic

#you could also just use default aov() or summary() in R instead 
#of the car package version, but that will be a sequential test by default

#check the model fit
par(mfrow=c(2,2))
sjPlot::plot_model(mod1, type="diag") %>% 
  sjPlot::plot_grid(.)

#if model diagnostics don't look good, you can try :
# - transforming the data first
# - Use glm() instead of lm() with a different link function (i.e. binomial, poisson, log, logit, etc.)
# - If nothing fits, then perhaps just stick with Wilcoxon and explain the above process 

#get a nice table summary of the model results
sjPlot::tab_model(mod1)

#plot model results
sjPlot::plot_model(mod1, sort.est = TRUE, show.values = TRUE)

#if you want the exact values from this plot,
sjPlot::plot_model(mod1, sort.est = TRUE, show.values = TRUE) %>%
  .$data %>% 
  tibble()

Also, within QIIME 2, you can use the q2-longitudinal plugin's linear-mixed-effects action that will take R style formulas. Though this won't have flexibility to do any transformation or a glm, and won't get in depth model diagnostics.

As a note though, I would strongly recommend either reading a bit about linear models first or chat with a statistician about your specific project goals to make sure you are in fact addressing your specific questions, you're well powered, and the model you run is a decent fit. For ex. if you need to test the interaction term between season and an infection status you would change your formula to alpha ~ season*infection which will demand different and more complex interpretation of results.

Hope that helps

jwdebelius · November 2, 2022, 9:04am

I'd love to tag onto @Mehrbod_Estaki's excellent answer, and also add that there's an option in qiime2 for a linear model, if you want to run one, hidden in q2-logidtudinal! (q2-longitudinal anova. I was shocked when I found it there, too!

I tend to run mine outside QIIME2 as well (although I use statsmodels) but I'll echo the points about interaction term interpretation as well, particularly for categorical variables.

Best,
Justine

SingeunOh · November 4, 2022, 8:51am

Thank you colin for your kind interests to me.

SingeunOh · November 4, 2022, 8:54am

Thank you so much, Mehrbod ! Your explanation helped me a lot with solving the problem. I also appreciate your code sharing. I will definetly try ANOVA in R for the analysis.