Accounting for covariates


I’m wondering if there is a way to factor in (and model out) the effects of covariates in qiime2, generally speaking.

To be more concrete, we are looking in to changes in microbiome under the condition of trisomy-21 vs disomy-21. This is (experimentally) complicated by the fact that human chr21 has synteny across three mouse chromosome, chr10, chr16, and chr17. In our experimental design, we have multiple replicates of trisomic-model (TS) and wild-type (WT) for each of the mutated mouse chromosomes.

DP10_TS | DP10_WT

DP16_TS | DP16_WT

DP17_TS | DP17_WT

The case/control group of interest is TS vs WT, but we observe a clear distinction at the strain level. While this is interesting (and not to be ignored), we want to see affects due to one variable at a time. Is there a way to “model out” the variation due to covariates, in this case strain, when performing actions such as the Kruskal-Wallis test for differences in alpha diversity?

As it currently sits, it seems like the only way to look at different variables it by collapsing together OTUs based on columns in the metadata table.

Hi @kohlkopf,
Good question. There are lots of methods implemented in QIIME 2 that allow use of covariates and mixed models, and others do not. It depends on the method (e.g., you cite Kruskal-Wallis, which as far as I know is always univariate), and on the implementation. Below is the “in general” answer (not specific to your study design) for the current methods that support multivariate models (off the top of my head):

For alpha diversity, both anova and linear-mixed-effects in the q2-longitudinal plugin support multivariate models. linear-mixed-effects right now is set up specifically for longitudinal design with repeated measurement from individual subjects, but anova is not and you can use it for multi-way ANOVA of theoretically any experimental design… you just input the R-style formula that you want to use.

For beta diversity, check out the adonis action. It supports multiple variables via and R-style formula.

For differential abundance you should take a look at q2-songbird and q2-aldex2 to see what they support. As far as I recall songbird does support multivariate models but does not give a traditional p-value if that’s what you’re after, aldex2 I am not sure but take a look. Eventually ANCOM2 will be implemented in the q2-composition plugin and support multivariate models, but right now that plugin uses the scikit-bio ancom implementation, which is univariate.

R-style formulae: if you are not familiar with this, you would just do something like this: “observed_species ~ factor1 + factor2”… see the help docs for individual methods for more details (as some methods you specify the dependent variable in the formula, others you do not)

I recognize the good intentions but please note that cross-posting is against the forum code of conduct as it duplicates people’s time. I recommend updating the stackexchange post to point to this answer (or delete?). In the future, please choose only one support channel. Thanks!


Thanks for informing me of the code of conduct. Cross post deleted!

1 Like

Exactly what I was looking for. Thanks a ton.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.