Gemelli and Deicode yield different results

Covarrubias · September 6, 2022, 7:54am

Hello,

My experimental design is two groups and three timepoints: baseline, 4, and 8 weeks. I wanted to compare beta-diversity between the groups at 4 and 8 weeks adjusted by the baseline.
I applied Gemelli and rpca from Deicode to get distance matrices. Then I ran first_distances followed by feeding a LMM from the plugin Longitudinal. The models contained two variables: Group and Time.
The results obtained from from both methods are completely contradictory: while the Gemelli showed a significant difference for both variables, there was completely nothing with rpca.

I also tried to model the PC1 with LMM with the following R syntax: PC1 ~ baseline_PC1 +Group + Time + (1|Subject_id). Both methods (Gemelli and rpca) showed no differences between the groups. However, with Gemelli the coefficient for the baseline_PC1 had a crazy value, like 6668 (and other coefs were around 0.05), and the model had a singular fit. The Emperor plot of all samples based on Gemelli indicates 89% of variance explained by PC1 and it looks flat (PC3 explains < 1%).

Could you tell me what is going on here? Which method is better fot my design?
Thank you!

jwdebelius · October 3, 2022, 9:54pm

Hi @Covarrubias,

Welcome to the forum!

There are two key things you need to consider about these methods.

First, the distance matrices in both DEICODE and Gemelli are calculated based on the ordination. (This is different from the way all the other ordination techniques work in QIIME 2). So, differences in your ordination will propegate into your distance matrix.

Next, Gemelli is based on the assumption that there's already an individual microbial signature, or that all the sames from a single individual will cluster together in an ordination space. This is usually true. (In my experience, it's rare to find an intervention that knocks a person out of their normal space.) To some degree, the Gemelli coordinates have therefore already taken the individual into consideration.
In constrast, rCPA is calculated based on the samples provided with no additional information about similarity. (I think the Gemelli tutorial illustrates this really nicely.)

When I try to pick between these technqiues, there are a few things I try to consider.

Is it true that each individual looks more similar to themselves than other people? Is this similarity randomly distributed across all my treamtne groups?
If yes, then Gemelli can be super useful to test differences.
Is there a between group difference at baseline? (Or alternatively, in the same sample coords in Gemelli)
If you're interested in accounting for baseline, do you want a change or and adjusted model? (I often find the change - which accounts for baseline) works better for me and has fewer issues with modeling.)

Best,
Justine