interaction terms in R-style formulas

Dear community,

I have a question that came to my mind, which is slightly outside the scope of QIIME2 but still relevant to it. I have searched the web for some guidance, but I couldn't find a definitive answer, so I decided to open this topic.

Suppose I want to run a linear regression using an R-style formula with the smf package in Python (Fitting models using R-style formulas - statsmodels 0.14.1) or within QIIME2 using adonis or linear-mixed-effects . I have two categorical variables of interest, A and B, as independent variables. A has 2 levels, and B has 3 levels. After coding a dummy variable, with A1 and B1 set as references, I want to include A2, B2, and B3 in the model.

y~ A2 + B2 + B3

If I do want to test the interaction between A and B, do I have to include two interaction terms (in terms of statistical error if I don't), like this:

y ~ A2B2 + A2B3

Or would it also be acceptable to include only one interaction term based on the specific value of B that interests me (B2/B3), like this:

y ~ A2B2

or

y ~ A2B3

I would appreciate hearing your opinions on this topic! :slight_smile:

2 Likes