Gemelli output interpretation

jwdebelius · May 20, 2021, 4:58pm

Hi everyone (but probably @cmartino),

Im trying to figure out how to interpret the outputs of gemelli, specifically the state and feature trajectories. My interpretation is that these are essentially the movement of the feataures and samples in some major factorization hyperspace but I'm not entirely sure what to do with that. For instance, if I have a before/during/after case where my responders all have increasing values along PC1 and I have increasing values along PC 1 with ASV 1, should the interpretation be that during treatment, responders have a relative increase* in ASV 1 (*given all the ALR "increase" assumptions?)

I tried checking the documentation and struggled to find an explanation of the output files.

Thanks,
Justine

cmartino · May 20, 2021, 5:51pm

Hi @jwdebelius,

Thanks for trying out gemelli/CTF!

If you have ASV 1 that increases in value along PC1 with responders, then you would want to identify a ASV 2 decreasing along PC1 with non-responders. The log-ratio of ASV 1 and ASV 2 should then replicate/mirror what you are seeing in your trajectory. Another way to look at the results is with the subject_biplot in which dots will be subjects and the separation representing that difference in dynamics of different ASVs across time. Those ratios of ASVs driving those dynamics are represented as arrows in the biplot. You can think of the state_biplot in the same way but with dots being timepoints. Qurro can be used with the subject_biplot to make the log-ratios easier to generate.

I have some tutorials here with both QIIME2 API & CLI (as well as python standalone API/CLI). But I am working on more tutorials with different types of repeated measures data (including an intervention type study design).

Figure 1 in the paper (here) is a toy example with an intervention type study design. In that example figure the matching output from Gemelli for each subpanel is (e) subject_biplot.samples (f) state_biplot.samples (g) state_biplot.features or subject_biplot.features.

Hopefully, this helps clear it up.

Thanks,

Cameron

jwdebelius · May 21, 2021, 7:17pm

Thanks @cmartino!

At the moment, I'm specifically interested in the question of it there's some taxa that define an overarching shift across all my individual. For intance, if I'm testing that birth mode affects the developing microbiome, I might first want to show that there's an aggregate shift that comes with development.

I can essentially center the state and show a "directional" difference associated with time for a two-state instance, where basically the difference between 0 and 6 months has a consistent direction along PC 1. Would it be unreasonable to use the ALR of the features between the two extremes to test and/or make a statement about features associated with that shift based on the directionality of state-feature-changes?

cmartino · May 28, 2021, 3:23pm

Hey @jwdebelius,

That would be very reasonable. In the paper, we also looked at birth mode, then summed N-taxa associated with each group (by using the PC1 loadings) until there were no zeros in either the numerator or denominator of the log-ratio. We then used LME models and t-tests per time point to test that log-ratio. Amy Willis summarized this nicely in a tweet (see below):

https://twitter.com/AmyDWillis/status/1301268123509616640?s=20