first-distance and first-distance-baseline-0

nathaniel_hubert · November 18, 2020, 11:34pm

Hello,
I am trying to see if there are differences in diet in our dataset. It appears there may have been some differences at week-0, I am still waiting to hear if week-0 was just the first week measured or actually prior to treatment administration. While waiting for a response from my collaborators, I thought I would try using first distance with a baseline static point to see if diet is driving differences since week-0, even if there were differences at week-0. In the LME results from first distance baseline-week-0, there is no significance by diet, there is significance by week_number and the regression scatterplot shows me only plots at week-4 and week-8 (my two time points since week-0). However, when I run first distances without baseline-week-0, diet, week and diet:week are significant. The regression scatterplot still only plots week-4 and week-8. Why does the regression scatterplot not include week-0? In both cases, my No. groups = 20 as I have 20 mice, and in both cases my group size = 2. But I have 3 timepoints per mouse. If my week_number for one of the sets of observations is = 0, does that mean QIIME will ignore this set? It seems that is not the case or baseline=0 would be the same as non-baseline.

Being relatively new to LME, I may be formatting my input incorrectly and would appreciate any suggestions. Thank you

Below is an example of the baseline-week-0 first distance commands used. Note that the only difference between this and the non-baseline is: "--p-baseline 0 "

Thank you and cheers Nate

qiime longitudinal first-distances
--i-distance-matrix bray_div_pct_stool_female_s15_distance_matrix.qza
--m-metadata-file QIIME_map.txt
--p-state-column week_number
--p-individual-id-column mouseID
--p-replicate-handling random
--p-baseline 0
--o-first-distances first-distances-baseline-0_stool_female_s15_week.qza

qiime longitudinal linear-mixed-effects
--m-metadata-file first-distances-baseline-0_stool_female_s15_week.qza
--m-metadata-file QIIME_map.txt
--p-metric Distance
--p-state-column week_number
--p-individual-id-column mouseID
--p-group-columns diet
--o-visualization first-distances-baseline-0_LME_stool_female_s15_week.qzv

qiime tools export
--input-path first-distances-baseline-0_LME_stool_female_s15_week.qzv
--output-path first-distances-baseline-0_LME_stool_female_s15_week

ChrisKeefe · November 19, 2020, 4:45pm

Hi @nathaniel_hubert!
Notice that the name of the command is "first-distances".

Regardless whether you use the baseline parameter, this plot is considering distances between states. When measuring change over time for K states, you end up with K-1 "changes". The x axis represents some measure of change between states, rather than a measure at each state. Arguably, you could include a zeroth point, but the distance between pt 0 and itself will always be zero, so it would be uninformative.

The Parkinson's mouse tutorial has a nice introduction to this method:

We use the baseline parameter to specify a static time point against which all other time points are compared; if we remove this parameter from the command, we look instead at the rate of change for each individual between each time point. See the longitudinal analysis tutorial for more details.

Hope that helps!
Chris

nathaniel_hubert · November 20, 2020, 8:06pm

ChrisKeefe:

Hi @nathaniel_hubert!
Notice that the name of the command is “first-distances”.

nathaniel_hubert:

does that mean QIIME will ignore this set?oes that mean QIIME will ignore this set?

Regardless whether you use the baseline parameter, this plot is considering distances between states. When measuring change over time for K states, you end up with K-1 “changes”. The x axis represents some measure of change between states , rather than a measure at each state. Arguably, you could include a zeroth point, but the distance between pt 0 and itself will always be zero, so it would be uninformative.

The Parkinson’s mouse tutorial has a nice introduction to this method:

We use the baseline parameter to specify a static time point against which all other time points are compared; if we remove this parameter from the command, we look instead at the rate of change for each individual between each time point. See the longitudinal analysis tutorial for more details.

Hope that helps!
Chris

Thank you, Chris!

Your response is very helpful indeed, though I am still wrapping my head around why defining baseline would yield such different results.

Does it make sense to not define baseline if I am interested in testing for differences in community changes that are independent of differences at baseline? The non-baseline LME is used when there is no defined baseline and therefore there could be pre-existing differences at the first time-point in the sequence. We are interested in overall differences in change and not just how things have changed relative to baseline. Am I understanding this correctly?

Thanks again,
Nate

ChrisKeefe · November 20, 2020, 9:29pm

This distinction took me a long time to get my head around, too, @nathaniel_hubert. I'm still not 100% I've got things right, but here's how I understand things.

First-distances looks at distances between an individual and the same individual, across changes in state (e.g. time). I suspect initial differences could be a confounding variable in a study, but I don't think pre-existing differences between individuals are the main concern here. Initial state might have an impact on rate of change, but individuals that differ at study outset will still be plotted relative to themselves.

Without a baseline, each distance is presented relative to that individual at the previous state. With a baseline, each distance is presented relative to that individual at some baseline state. Imagine a subject whose microbiome shifts dramatically between week 0 and week 1, and then stabilizes. Without a baseline, rate of change will be peak at the first point, and then drop back with stabilization. If we set week 0 as our baseline, on the other hand, all values after the dramatic change will exhibit a large distance from baseline.

I think this is moving in the right direction! Without baseline, we get a measure of rate of change from each state to the next, which is useful if we're interested in variations in rate of change. Baseline, on the other hand, gives us something concrete to compare against - a pre-intervention state, for example. It's a different viewpoint on the same data.

Anecdotally, I feel like having a baseline often makes the plot simpler to interpret. Clearly, each approach has its use cases, but one feels less complex to me, because each point can be judged relative to a consistent... baseline? Fewer moving parts, I guess. I have a lot more trouble thinking about what rate of change relative to the prior observation means. YMMV.

These methods are discussed in nice clear language in the paper, and come with pretty pictures to boot. Just search for first-distances and you'll find the right sections quickly. If that still doesn't clear things up, let me know and we'll hash it out.

CK

nathaniel_hubert · November 20, 2020, 9:37pm

Thank you so much, Chris!
This is very helpful and I really appreciate all the time you spent explaining it!
Nate

nathaniel_hubert · December 1, 2020, 3:29am

Hello @ChrisKeefe

Another quick question regarding the first distances LME workflow:
I found the "Fun fact" below very interesting and hoped to be able to use this strategy to determine which taxa were lost/acquired over time per treatment group. But I am not seeing any results from running LME with Jaccard distance that include particular taxa.

Is it accurate to say that this test tells tells you whether the loss/gain of taxa is an important driver of community differences, but not which taxa are lost/gained?

Thank you in advance!
Nate

From first-distance and first-distance-baseline-0 - #4 by ChrisKeefe :
Fun fact! We can also use the first-distances method to track longitudinal change in the proportion of features that are shared between an individual’s samples. This can be performed by calculating pairwise Jaccard distance (proportion of features that are not shared) between each pair of samples and using this as input to first-distances . This is particularly useful for pairing with the baseline parameter, e.g., to determine how unique features are lost/gained over the course of an experiment.

nathaniel_hubert · December 4, 2020, 11:47pm

If I am correct, I think the best way to answer the question of which taxa are falling in and out would be volatility analysis?

Nicholas_Bokulich · December 5, 2020, 7:19am

Jaccard distance is only going to tell you the proportion not shared, but not the specific taxa.

Exactly!

That would work, but is maybe a bit tangential to this specific question — you are looking at taxa that are mutually exclusive between timepoints, no? the volatilty analysis would find features with a temporal signature, which probably overlaps with what you want, but just to say it is not equivalent.

Your question reminds me of this paper... would be a neat method to add to q2-longitudinal if anyone is interested in contributing :
https://mbio.asm.org/content/5/4/e01371-14

system · January 5, 2021, 1:19pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.