Longitudinal mixed effects, first distance LME, and volatility

I've been running LME, first distance LME, and volatility but am unsure of how to interpret some of the data. I went through the tutorials but was unable to find a description of how to analyze the data tables that accompany the graphs. Specifically, the "Model results" table is confusing to me and I was hoping one of you could describe to me what each row represents so I can continue with my analysis. I've attached one of my "Model results" tables for an example. The parameters used to run this are below.

Linear mixed effects parameters
Metric: Distance
Group column: [delivery_mode]
State column: age
Individual ID column: host_subject_id
Random effects: None

Each row represents a different fixed or random effect (or their interaction). Explanation row-by-row:

Intercept: the random intercept fit for each individual. Significant P value indicates that individuals have different starting values for Distance.
delivery_mode: does delivery mode have a significant effect on baseline values? No
age: does Distance change significantly with age? Apparently not.
age:delivery_mode: age x delivery mode interaction. Does delivery mode influence Distance longitudinally (as an individual ages)? No.
groups RE: residual error. Quite low, meaning that you have probably captured most of the major sources of variance in this model (or their cofounders).

This preprint has a little more detail on interpreting LME (and other methods in q2-longitudinal). The tutorial links to the statsmodels website with some more technical documentation. LMEs can be pretty difficult to interpret, so you could also gain a lot more insight by speaking to a statistician.

I hope that helps!

2 Likes

Thank you! At which point do residual errors become high? Does it work similarly to p-values as in when it passes 0.05 it no longer becomes significant?

You should really talk to a statistician about that — RE is going to be judged relative to the parameter estimates ("coef.") for each variable, but I don't really know at what point we can consider this "high" or "low".

No, there is not a threshold like that for residual error (it's all relative).

The "P>|z|" column in that table is the P value for each variable/interaction, which you are going to use to determine significance. Z value is your effect size.

Just a top off question for this thread since its related. Are the coefficient values standardized or unstandardized?

That's a great question @Mehrbod_Estaki

I believe these coefficients are non-standardized. It sounds like that may be the norm for the lmer package in R, which statsmodels uses as a comparison. Similar to lmer, it sounds like it's up to the user to scale the input variables prior to running the model if they want to standardize the coefficients.

But I am not 100% sure, and you may want to cruise the statsmodels documentation for more details.

1 Like

Thank you!

Also, can you explain the first distance LME graph? I understand that it displays the change in beta diversity over time (while LME shows the alpha diversity over time) but what samples is it comparing to? When I compare vaginal to c-section data, there are 2 lines that begin at different points. If it compared them both to each other, the points on each line would have the same values. Why are they different?

I also have a question about the q2 code to run LME, first distance LME, and volatility. How can I have it start at a certain age and then end at a certain age (age is in the --p-state-column command). I want it to start at age 0 and end at age 10. Thank you again.

LME does not have anything to do with alpha diversity, unless if you are inputting an alpha diversity vector as your dependent variable. So you can run LME on alpha diversity data, first differences/distances, or metadata values. Trying to compare an LME on alpha diversity data vs. an LME on beta diversity FD data is like talking about apples and oranges — there is no reason why the two should necessarily give related results.

First distances measures each individual's change in beta diversity between each time point. So if you have samples collected from a group of children once per year ages 0-10, then first distance 1 will be the distance between age 1 and age 0 for each child. FD 2 will be age 2 - age 1. Etc. Each individual's sample at time X and time X-1 are being compared to measure FD — there is no "standard" sample that everything is being compared to.

You would need to show me the plots you have for me to understand and explain what you are seeing. It's not entirely clear based on your description.

If you want to exclude certain time points, you need to filter these out. So, e.g., use feature-table filter-samples to remove samples based on metadata values (e.g., age) prior to running LME/volatility.

I hope that helps!

Great, thank you!

When I try to run the LME command on my first distance qza file using...

qiime longitudinal linear-mixed-effects \
> --m-metadata-file first-distances-gut_c-section_1-10.qza \
> --m-metadata-file mixed.sample.CR.NEW.NO.BLANKS.tsv \
> --p-metric Distance \
> --p-state-column age \
> --p-individual-id-column host_subject_id \
> --p-group-columns type_of_feed \
> --o-visualization gut_c-section_1-10-first-distances-LME

I get this error:
Linear model will not compute due to singular matrix error. This may occur if input variables correlate closely or exhibit zero variance. Please check your input variables. Removing potential covariates may resolve this issue.

How can I fix this?

Hi @Stephanieorch,

Either your table was filtered incorrectly/unintentionally, or the metadata values that you selected are equal across all subjects. The error is just what it says — there is zero variance. By any chance did all c-section subjects receive the same type_of_feed?

You should inspect the filtered table, the first distance values, and the metadata values to make sure that everything looks okay.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.