Conflicting LME results

Hey everyone, it's John again

I'm processing the gut microbiome data from the Mars500 experiment to see how the gut microbiome of the participants change due to the isolation they are in. Since this data is time series data I've been using the regular alpha and beta diversity testing along with the volatility plots in the q2-longitudinal plug-in.

I gathered the data from the paper, Reanalysis of the Mars500 experiment reveals common gut microbiome alterations in astronauts induced by long-duration confinement. Which looked at the same data but also included the habitat microbiome data as well. I'm more interested in the changes overtime in the gut microbiome due to the effects of isolation they are experiencing.

The paper just compared the gut and habitat samples from the beginning and ending of the isolation part of the experiment. I included the samples found in-between to see the full timeline, so I created a column titled time_group which had five different groupings depending on what time the sample was taken.

Pre is between 0 and 1 days, 2 time points samples were taken.
Early is between 7 and 45 days, 5 time points samples were taken.
Mid is between 60 and 390 days, 13 time points samples were taken.
Late is between 420 and 520 days, 5 time points samples were taken.
After is between 530 and 700 days, 3 time points samples were taken.

Metadata: sample-metadata.tsv (25.7 KB)

I conducted my tests and now I'm validating the results I found in the volatility plots. I saw that to test the results that are seen in the volatility plots, you want to use the LME that in the q2-longitudinal plug-in.

Shannon_volatility plot: shannon_volatility.qzv (476.3 KB)

So I'm currently running the LME with the Shannon diversity matrix and I created two LME models and I'm trying to figure out why they are both saying two different things? First I made this LME just using the state column, shannon-LME-time.qzv (406.5 KB) I made this because I'm just interested to see if the Shannon scores change overtime, which would help show if the prolonged isolation had any effect on the Shannon scores. The function call is below.

qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-file Analysis/shannon_vector.qza \
  --p-metric shannon_entropy \
  --p-individual-id-column Characteristics.Subject. \
  --p-state-column time_day \
  --o-visualization LME/shannon-LME-time.qzv

In the model results we can see that the p-value is significant for the state column, which my interpretation is that the changes in the Shannon scores over time was due to time.

I then wanted to see if including the time_group column in the independent part would change anything in the LME output. So I changed the above prompt and got this LME, shannon-LME-grp.qzv (449.9 KB). The function call is below.

qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-file Analysis/shannon_vector.qza \
  --p-metric shannon_entropy \
  --p-individual-id-column Characteristics.Subject. \
  --p-state-column time_day \
  --p-group-columns time_group \
  --o-visualization LME/shannon-LME-grp.qzv

The results from this LME show that the state column and the group variable didn't have an effect on the Shannon diversity scores. My interpretation from this LME is that the change in the Shannon scores wasn't due to time.

From these results I have 3 questions.

  1. Why would the inclusion of the group variable change the results of the state column being significant?
  2. Should I use the time_group variable at all? I'm rethinking the usage of this group variable for the LME since the state column is time and including time again doesn't seem to make the most sense to me.
  3. Would just using the metric and state-column parameters for the LME function show if the changes in the metric overtime was due to time?

Thank you all for your time and help.

I re-looked through the forum and the LME documentation and I think I have a good understanding of what the LME outputs are saying but I still have some questions.

Two forum posts that I found extremely helpful were this one and this one. These two forum posts helped explain the parts of the LME model results extremely well and it was easy to understand.

So if we look at my Shannon diversity LME time_group results here: shannon-LME-grp.qzv (450.4 KB).

We can see that the intercept group is the Early time group and that its significantly different from 0. We can see if we swapped the time group from Early to the other two we can see that they aren't significantly different from 0. We can see that there isn't a change over time in the reference group and we can see that there isn't a change over time in the different groups.

  1. Is this interpretation correct? I do believe that this interpretation of the model results is correct, but I want to make sure.

Now with this LME usage, I think I used the wrong group variable, since the Early, Mid, and Late time_groups don't span the entire time span of the experiment.

  1. So to see if the diversity scores significantly change over time I would just re-run the LME again but with just the time variable?

Which I did that here: shannon-LME-time.qzv (406.5 KB).

Where we can see that the intercept and the time_day parts are significant.

  1. What are the intercept and time_day outputs saying in a LME that is just looking at how the shannon_diversity changes over time?

My final question is that I was also using the first difference function from the q2-longitudinal plug-in and when I was generating the LME results for this matrix, I got this error:

Plugin error from longitudinal:

  Cannot predict random effects from singular covariance structure.
  1. Why would I get this error from trying to run an LME on the first difference output when I can use an LME on the first distance output?

Shannon first difference artifacts here:

shannon-baseline-differences.qza (123.3 KB)
shannon-first-differences.qza (123.3 KB)

Prompts that gave me this issue are below.

qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-file First/shannon-first-differences.qza \
  --p-metric Difference \
  --p-individual-id-column Characteristics.Subject. \
  --p-state-column time_day \
  --p-group-columns time_group \
  --o-visualization First/shannon-diff-LME.qzv

qiime longitudinal linear-mixed-effects \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-file First/shannon-baseline-differences.qza \
  --p-metric Difference \
  --p-individual-id-column Characteristics.Subject. \
  --p-group-columns time_group \
  --p-state-column time_day \
  --o-visualization First/shannon-baseline-LME.qzv

Hi @johnbiggs ,

No, this does not look like a valid variable to include, as it is not actually an independent variable but just different time windows that do not overlap. So I would not include this and don't think that it is appropriate to compare these.

Your LME just looking at time_day looks fine, though.

  1. that the intercept is significantly different from zero
  2. that you see a significantly linear association between time and Shannon diversity.

You should check the first differences values. You might have too many missing values or zero values.

Good luck!

Thank you @Nicholas_Bokulich for your reply. It makes sense that the time_group variable wouldn't be useful for this analysis since it's not an independent variable in this experiment and the groups don't span the time-span of interest which can throw off the LME in its calculations.

I still have some questions about the usage of the LME.

I think I should've worded my question here a little bit better, but I was wondering what the intercept time_day are in this case of an LME just looking at the shannon_diversity changes over time? From your answer I can see that the time_day output having a significant p-value is saying that there is an association between the shannon diversity scores and time, and with looking at the regression scatterplot from this LME we can see that there is a linear increase in the scores overtime. But I'm still confused on what group the intercept and the LME overall is representing in this instance. Is the LME looking at how each subject's shannon diversity scores change over time, since it always takes in the individual ID column along with the metric and state columns or is it looking at every sample individually and not the subjects in this experiment? Would the intercept just be the scores at the first time point and seeing how different they are from zero (taking in all of those scores at the first time point and either finding mean or median and comparing that value to zero)?

My second question is, if I have another microbiome sample type, lets say the habitat samples, and I want to see how the shannon diversity scores for the habitat and human samples change over time. Would I add a sample_type column (A column that tells which sample is either a human sample or one of the 4 habitat locations from the experiment) to the LME's group variable parameter? Because the groups that the LME would look at are the human samples and the 4 different habitat microbiomes they sampled and seeing how those shannon diversity scores differ from zero along with how they associate with time.

To show off this second question I'm showing off an LME I generated with another papers data to see how the shannon diversity scores changed over time for all of the different microbiomes that were sampled in that paper. Is this a valid LME analysis of longitudinal data, having the fixed effects formula equal shannon_entropy ~ time_days * surface_sampledsample, where surface_sampledsample is a group variable that describes which sample is tied to a specific microbiome that was sampled overtime in this experiment? (In the LME we can see that the intercept, common_room, and squad_common_room scores are different from zero along with time affecting the shannon scores of the common_room, skin, and squad_common_room samples)

shannon-LME.qzv (720.5 KB)

My last question is about the First difference question I had in my last post.

Below is the scoring matrix for the first difference analysis on the Shannon Diversity scores and looking at the scores, I don't notice any missing values but I do notice that some of the values are close to zero, would that cause the error of the LME not being able to predict random effects from singular covariance structure? I was able to do a First difference baseline analysis and have an LME analyze that output without issue, but for some reason the LME can't analyze the first difference matrix.

FirstDifferences.tsv (3.6 KB)

I'm sorry if I'm asking too many questions, I'm a relatively new microbiologist and the only bioinformatics person on my team, so I don't have anyone that I can ask in-depth questions about this kind of stuff.

Thank you again @Nicholas_Bokulich for your time and help! It's greatly appreciated!

Hi @johnbiggs

Yes the intercept is the predicted value at time 0 when all other independent variables are not taken into account. The covariates for each independent variable (when you have additional fixed effects in the model) then determine how much the intercept is changed by that variable.

Regarding your second question, yes this makes sense to me, but I do not have an in-depth knowledge of the experiment and hypothesis. I would recommend discussing your experimental design with a statistician to make sure that this test and formula structure are suitable for your experiment. Sitting down with a statistician over a cup of coffee can do wonders for your analysis :grin: :coffee:

Good luck!