The qiime gneiss plugin has two options for performing regressions on balances: ols-regression and lme-regression. According to the plugin pages, the first is for running an ordinary least squares linear regression and the second is for running a linear mixed effects model.

I don’t have a strong background in maths and stats so I was wondering if anyone could help me understand the difference between the two and when we would choose one over the other. I asked around with some friends who study maths (but don’t have experience with microbiology and Qiime) and they led me to explanations like this.

Because there is still so much to learn about how microbial communities function and what influences them, I would assume that most microbiome datasets probably contain some variables that are confounded or which we’ve failed to measure … which I think suggests that the mixed effects approach would be more appropriate? But I would love some guidance from people who understand these things better.

The main difference is that ols-regression is purely a fixed effects model, wheras lme-regression is a mixed effects regression model. Wikipedia actually has a pretty good explanation about random effects here.

Typically, I use the lme-regression module whenever I have a lot of repeated measures (i.e. like a time series). Essentially, whenever you are explain the variability within single subjects, mixed effects models may be appropriate.

But I'm still not really sure I understand, sorry. The Wikipedia page mentions that biostatistics uses a different definition of fixed and random effects from normal statistics and econometrics -- which one are we working with here?

[A random effects model] assumes that the data being analysed are drawn from a hierarchy of different populations whose differences relate to that hierarchy

Contrast this to the biostatistics definitions, as biostatisticians use "fixed" and "random" effects to respectively refer to the population-average and subject-specific effects (and where the latter are generally assumed to be unknown, latent variables).

Regarding the second point, if I only have one sample from each subject, and hence don't expect correlation between samples from the same subject to be a factor, would you recommend using the ols-regression?

I also noticed that some sources recommend trying out both fixed- and mixed-effect and then performing some kind of correlation test (Hausmann?) to see which appears to be more accurate. Would this be a good/applicable way to approach gneiss analysis if we are not sure which type of regression is best for our data?

In the wikipedia page, the race and sex are good examples of fixed effects, whereas the test scores of an individual student is a random effect (since the students performance is random). Some of the excerpts I admit are a bit confusing due to differing jargon across fields.

I think it is appropriate to use ols-regression if you only have 1 sample from any given subject.

We don’t have correlation tests built into gneiss yet, but that could be a cool addition to have. When it comes to choosing a regression model, I typically default to ols-regression since it is a little easier to interpret. If there are really strong effects, even if there are random effects, ols-regression can often pick it up. I usually resort to lme-regression if I there are weaker signals within a time series.