Dear community,
I have a question that came to my mind, which is slightly outside the scope of QIIME2 but still relevant to it. I have searched the web for some guidance, but I couldn't find a definitive answer, so I decided to open this topic.
Suppose I want to run a linear regression using an R-style formula with the smf
package in Python (Fitting models using R-style formulas - statsmodels 0.14.1) or within QIIME2 using adonis
or linear-mixed-effects
. I have two categorical variables of interest, A and B, as independent variables. A has 2 levels, and B has 3 levels. After coding a dummy variable, with A1 and B1 set as references, I want to include A2, B2, and B3 in the model.
y~ A2 + B2 + B3
If I do want to test the interaction between A and B, do I have to include two interaction terms (in terms of statistical error if I don't), like this:
y ~ A2B2 + A2B3
Or would it also be acceptable to include only one interaction term based on the specific value of B that interests me (B2/B3), like this:
y ~ A2B2
or
y ~ A2B3
I would appreciate hearing your opinions on this topic!