Dear community,

I have a question that came to my mind, which is slightly outside the scope of QIIME2 but still relevant to it. I have searched the web for some guidance, but I couldn't find a definitive answer, so I decided to open this topic.

Suppose I want to run a linear regression using an R-style formula with the `smf`

package in Python (Fitting models using R-style formulas - statsmodels 0.14.1) or within QIIME2 using `adonis`

or `linear-mixed-effects`

. I have two categorical variables of interest, A and B, as independent variables. A has 2 levels, and B has 3 levels. After coding a dummy variable, with A1 and B1 set as references, I want to include A2, B2, and B3 in the model.

y~ A2 + B2 + B3

If I do want to test the interaction between A and B, do I **have** to include two interaction terms (in terms of statistical error if I don't), like this:

y ~ A2

B2 + A2B3

Or would it also be acceptable to include only one interaction term based on the specific value of B that interests me (B2/B3), like this:

y ~ A2B2

or

y ~ A2B3

I would appreciate hearing your opinions on this topic!