I've been running LME, first distance LME, and volatility but am unsure of how to interpret some of the data. I went through the tutorials but was unable to find a description of how to analyze the data tables that accompany the graphs. Specifically, the "Model results" table is confusing to me and I was hoping one of you could describe to me what each row represents so I can continue with my analysis. I've attached one of my "Model results" tables for an example. The parameters used to run this are below.
Linear mixed effects parameters
Group column: [delivery_mode]
State column: age
Individual ID column: host_subject_id
Random effects: None
Each row represents a different fixed or random effect (or their interaction). Explanation row-by-row:
Intercept: the random intercept fit for each individual. Significant P value indicates that individuals have different starting values for Distance. delivery_mode: does delivery mode have a significant effect on baseline values? No age: does Distance change significantly with age? Apparently not. age:delivery_mode: age x delivery mode interaction. Does delivery mode influence Distance longitudinally (as an individual ages)? No. groups RE: residual error. Quite low, meaning that you have probably captured most of the major sources of variance in this model (or their cofounders).
This preprint has a little more detail on interpreting LME (and other methods in q2-longitudinal). The tutorial links to the statsmodels website with some more technical documentation. LMEs can be pretty difficult to interpret, so you could also gain a lot more insight by speaking to a statistician.
You should really talk to a statistician about that — RE is going to be judged relative to the parameter estimates (“coef.”) for each variable, but I don’t really know at what point we can consider this “high” or “low”.
No, there is not a threshold like that for residual error (it’s all relative).
The “P>|z|” column in that table is the P value for each variable/interaction, which you are going to use to determine significance. Z value is your effect size.
I believe these coefficients are non-standardized. It sounds like that may be the norm for the lmer package in R, which statsmodels uses as a comparison. Similar to lmer, it sounds like it’s up to the user to scale the input variables prior to running the model if they want to standardize the coefficients.
Also, can you explain the first distance LME graph? I understand that it displays the change in beta diversity over time (while LME shows the alpha diversity over time) but what samples is it comparing to? When I compare vaginal to c-section data, there are 2 lines that begin at different points. If it compared them both to each other, the points on each line would have the same values. Why are they different?
I also have a question about the q2 code to run LME, first distance LME, and volatility. How can I have it start at a certain age and then end at a certain age (age is in the --p-state-column command). I want it to start at age 0 and end at age 10. Thank you again.
LME does not have anything to do with alpha diversity, unless if you are inputting an alpha diversity vector as your dependent variable. So you can run LME on alpha diversity data, first differences/distances, or metadata values. Trying to compare an LME on alpha diversity data vs. an LME on beta diversity FD data is like talking about apples and oranges — there is no reason why the two should necessarily give related results.
First distances measures each individual’s change in beta diversity between each time point. So if you have samples collected from a group of children once per year ages 0-10, then first distance 1 will be the distance between age 1 and age 0 for each child. FD 2 will be age 2 - age 1. Etc. Each individual’s sample at time X and time X-1 are being compared to measure FD — there is no “standard” sample that everything is being compared to.
You would need to show me the plots you have for me to understand and explain what you are seeing. It’s not entirely clear based on your description.
If you want to exclude certain time points, you need to filter these out. So, e.g., use feature-table filter-samples to remove samples based on metadata values (e.g., age) prior to running LME/volatility.
I get this error:
Linear model will not compute due to singular matrix error. This may occur if input variables correlate closely or exhibit zero variance. Please check your input variables. Removing potential covariates may resolve this issue.
Either your table was filtered incorrectly/unintentionally, or the metadata values that you selected are equal across all subjects. The error is just what it says — there is zero variance. By any chance did all c-section subjects receive the same type_of_feed?
You should inspect the filtered table, the first distance values, and the metadata values to make sure that everything looks okay.