Cage effect ANOVA

timanix · June 12, 2024, 11:07am

Dear All,

I am performing the ANOVA test for the alpha diversity of some mice trials.
I know that the cage effect for mice is very strong and would like to account for it in the analyses. Experimental design of the data I got is following:

Group1           Group2
Cage1 * n        Cage4 * n 
Cage2 * n        Cage5 * n
Cage3 * n        Cage6 * n

So, each cage has n animals, but cages are not shared between groups.

If I run ANOVA test with formula "metric ~ Group", p-value is significant.
If I run "metric ~ Group + Cage", p-value for Group is not significant and significant for Cage.

My question is should I account for the cage effect in the case if animals from each cage are not shared between groups? For example, If I plot metric by the group, it is obvious that animals from each cage from Group 1 is lower than from Group 2.
Is accounting for the Cage effect appropriate with such design?

Note: Both cases are fine; I am not seeking certain results. I just want to report them correctly and better educate myself how to handle cage effect.

Thank you for reading.

jwdebelius · June 12, 2024, 1:38pm

Hi @timanix,

Pseudoreplication is always fun! You probably do need to account for cage effect in your model. Hopefully, you have multiple cages per group, because that's the key to making this work.

I've been doing some wokr on the issue of psuedoreplication/clustering recently. Our main concusion was that you needed to account for clustering. So, you've got 3 options:

Adjust for cage effect in the ANOVA (q2-longitudinal ANOVA). It's possible cage confounds your Group variable, which would suck, but might the case. You'll also havet o do this in beta diveristy because AFAIK adonis doesn't have another model.
Use a Linear mixed effect with cage as a random intercept (this is my favorite!). It's not possible in the current QIIME 2 framework, which assumes an ordered gradient within a group, but should be runnable in R, python, or your choice of other stats programs.
Use a Generalized Estimating Equation (GEE) with cage as the grouping effect. See the LME notes.

I'm not enough of an expert to say whether LME or GEE is better; I generally prefer LME for linearish things because that's what I'm comfortable with but I still need to work through how GEE is different from LME. ...If you have a time series, you should use LME because that lets you have multiple nested clusters (i.e. mouse within cage).

I would recommend explanding your mouse x group plot to include cage (so side by side boxplots, or a jitter with cage colored) to see if you're seeing grouping by cage.

Best,
Justine

timanix · June 12, 2024, 1:56pm

Hi @jwdebelius

Thank you for your reply!
I hoped you would answer since I remember your comment that mouse studies should include the cage effect in the analyses.

It is what I tried to achieve by including Cage to the formula: "metric ~ Group + Cage"

Yes, there are multiple cages per group. My confusion was that the cages only contained the animals from one group at a time. I thought that, ideally, representatives of both groups should be in each cage (difficult for the experimental design, I guess).

Yes, I used the same formula "Group + Cage" for both.

I don't have time points, so I assume that LME is not an option for me. But thank you for the options 2 and 3, I will put them to my arsenal for longitudinal analyses.

That is a nice suggestion; I will differentiate cages by color.

I am always happy to learn from your comments

jwdebelius · June 12, 2024, 2:03pm

Hi @timanix,

LME with an unordered model isn't an option within qiime2. If you either (a) canabalize the code or (b) follow tutorials for statsmodels or LMER the models will both totally take data within a time series. We commonly use them for time series data, but they're there for basically anything that's nested, ordered or unordered. My current research group uses them a lot for the effects of a study site, and observations aren't ordered within the study site.

Best,
Justine

Brandon · July 29, 2025, 7:13am

Hi Justine,
I have the same experiment design here, my cage and disease are confounded. (fisher.test(table(meta$disease, meta$cage)) with pvalue=2.375e-10)

I tried lmer: summary(lmer(y~disease+(1|cage), data =meta))

I am confused why the model results with Variance=0 for Random effects. I would highly appreciate any suggestions or thoughts you may have! Thank you so much!
-Ivy

jwdebelius · July 29, 2025, 3:19pm

Hi @Brandon,

I'm not sure why this is happening. Have you compared the distribution of y to disease and cage in your data visually? I sometimes find just looking at my data can help be get a sense of what's happening

Best,
Justine

ebolyen · July 29, 2025, 4:23pm

I think Singular Fits heading in the GLMM FAQ might also be handy if you haven't seen it:

Brandon · July 29, 2025, 5:00pm

Hi Justine,
Thanks for the response. Below is my data:

I would like to hear any thoughts you may have.
Thank you!

jwdebelius · July 30, 2025, 12:37pm

Hi @Brandon,

So, it looks like you don't have the kind of errors that would lead to that variance AFAIK.

I think @ebolyen's link is a great one. It might also be worthwhile to have a friend who does R go through your code with you. I'm always amazed at how bad R is at handling things like variables that didn't end up being a factor when they needed to be a factor, or variables that we're converted to an appropriate dummy variable. (Not that I dont have dummy variable issues other places, they're just different.)

Best,
Justine

Brandon · July 30, 2025, 5:12pm

Thank you @jwdebelius . I will check this out.

yangyue · July 31, 2025, 3:57am

I agree with the advice from Justine about the apply of LME, but I wonder why your experiment has been designed with two groups with vary different cages in each group? From the picture you have provided, these two groups has no significace by my direct experience to watching the error lines, so this formula "metric ~ Group + Cage" is aproperite but not needed.

Brandon · July 31, 2025, 4:26am

Thanks for your comments!! This is nested design, Group and Cage are completely confounded in my data, so directly applying the model 'metric ~ Group + Cage' would result in collinearity issues.