Is longitudinal linear mixed effects model is the right choice of analysis?

Hi everyone,

I am facing a difficulty with some of the analysis and I would like to ask for a help. I have collected fecal samples from cows (mother) and their calves to measure the viable bacterial cell count (CFU) and microbial diversity (shannon and bray curtis) of each sample. Next, I want to investigate the correlation between CFU and shannon index. Therefore, I have done qiime2 longitudinal linear mixed effects model to see the relationship. Below is the code I used:
qiime longitudinal linear-mixed-effects
--m-metadata-file sample-metadata.tsv
--m-metadata-file core-metrics-results/shannon_vector.qza
--p-metric shannon_entropy
--p-group-columns Age
--p-state-column CTX_CFU
--p-individual-id-column Animal_ID
--o-visualization linear/linear-ctx-cfu-shannon.qzv
Age is where I categorized into mother or calf (fixed effects). CTX_CFU is the numeric variable (dependent variable) where I have entered CFU for each sample in the metadata. Below is the screenshot of the result.

I am still new to this field and I am not sure whether I have done the appropriate analysis to investigate the correlation between the 2 variables. If there is other type of analysis that is appropriate to answer this question, please let me know. Also is the graph in the second picture is suitable to use for honours thesis in the result section?


Hi @kandreon,

Welcome to the :qiime2: forum! This sounds like a super cool project!

I'll just say that one of my challenges in this field is how many different types of data and Im constantly learning new things! If you're not familiar with LMEs, there are lots of good resources online introducing the idea and some of the assumptions. In terms of the modeling question, I think you want to decide if it's reasonable to assume the shannon diversity is correlated between mother/calf pairs. If they should be correlated, then I think you're in good shape with an LME. If that's not an assumption you want to make, the ANOVA function in q2-longitudinal, which is basically a good multivariate regression, might be worth checking out.

My general suggestion here, though is to talk with your advisor/supervisor. They're probably in a better position to help you evaluate what assumptions are right for your data. They're definitely in a better position tell you what is or is not acceptable for your thesis than any of us are. (I barely know what my students need to do without checking the departmental requirements.)



Hi @jwdebelius,

Thank you for your advise!! Though I am little bit confused with making an assumption that there is a correlation between CFU and alpha diversity for both mother and calves. I want to know if the correlation between CFU and alpha diversity is significant or not. In that case should I conduct ANOVA function in q2-longitudinal?

Thanks again

Hi @kandreon,

Either model can look for a correlation between the ctx and alpha diversity. The question is which one makes the right assumptions. I'm going to recommend reading a statistical primer on linear regression and linear mixed effects models - either in a text book or a page like the UCLA stats resources - that fully describe the model.

To get a little mathy, a linear regression is defined as

y = \beta_{0} + \beta_{1}x_{1} + ... \beta_{n}x_{n} + \epsilon

where \beta is the slope component, and \epsilon is the error component.

This is the linear regression, and the "fixed" protion of the linear mixed effect model. But, the mxied effect model also has a "random effect" component, which essentially lets you say that within your parent goup, you have sub groups which might have different means or different slopes.
In microbiome, we often assume that a person walks in with some alpha diveristy that's a baseline alpha diversity, and we want to be able to account for that grouped effect. If I use mixed/R notation, it looks like of like this:

y = \beta_{0} + \beta_{1}x_{1} + ... \beta_{n}x_{n} + \epsilon | z

So, if I look at your model, it looks like you're investiating

shannon = ctx * age | animal ID

You age seems to be categorical: mother vs calf? So, I'm assuming your animal ID is about the mother-calf dyad. Again, Im guessing based on what you've presented and not seeing your data. You, your advisor, or someone who is actually familiar with your analysis and data set should be able to help you. I've got more datasets at the moment than hours in the day.

If your data should be grouped by animal because its either a repeated measure or you've got mother-calf dyads who you think are more similar, than the LME is the best model. Otherwise, you may be happier with the ANOVA model.



Thank you @jwdebelius! Really appreciate your information.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.