Hi @SingeunOh,
Absolutely you can use linear models or GLMs with categorical data. In fact, the common ANOVA is essentially a special case of a linear model. This is especially pretty straightforward to do with alpha diversity because your response variable would be numeric and often follows a normal distribution.
Below is a typical workflow I do myself in R with alpha diversity that may be helpful for you. It assumes you have a table that has rows as sites (ie. samples, participants, etc.) and meta data in your columns (ex. season, infection, richness, etc.)
#required libraries
library(tidyverse)
library(sjPlot)
library(car)
#build basic model
mod1 <- lm(alpha ~ season + infection, data=mydata)
#run a marginal test Anova (Type II)
mod1 %>% car::Anova() #optional, add test.statistic="F" for F-statistic
#you could also just use default aov() or summary() in R instead
#of the car package version, but that will be a sequential test by default
#check the model fit
par(mfrow=c(2,2))
sjPlot::plot_model(mod1, type="diag") %>%
sjPlot::plot_grid(.)
#if model diagnostics don't look good, you can try :
# - transforming the data first
# - Use glm() instead of lm() with a different link function (i.e. binomial, poisson, log, logit, etc.)
# - If nothing fits, then perhaps just stick with Wilcoxon and explain the above process
#get a nice table summary of the model results
sjPlot::tab_model(mod1)
#plot model results
sjPlot::plot_model(mod1, sort.est = TRUE, show.values = TRUE)
#if you want the exact values from this plot,
sjPlot::plot_model(mod1, sort.est = TRUE, show.values = TRUE) %>%
.$data %>%
tibble()
Also, within QIIME 2, you can use the q2-longitudinal plugin's linear-mixed-effects
action that will take R style formulas. Though this won't have flexibility to do any transformation or a glm, and won't get in depth model diagnostics.
As a note though, I would strongly recommend either reading a bit about linear models first or chat with a statistician about your specific project goals to make sure you are in fact addressing your specific questions, you're well powered, and the model you run is a decent fit. For ex. if you need to test the interaction term between season and an infection status you would change your formula to alpha ~ season*infection
which will demand different and more complex interpretation of results.
Hope that helps