Hello,
I don't know if it's the right place to ask this question. It's more of a R Question. I am working on microbial ecology of skin cancer. I am using the adonis2 package. As input, I have a bray Curtis-Distance-Matrix from qiime2 and an metadata file. The adonis2-command itself works finde, but I am clueless as to why the anova output only names "model" instead of my metadata categories tumor and bodyside. This is also the case when I'm looking for the interaction of the two using the *
When you use the syntax swab_tibble$tumor, you are passing the literal vector of values from that column into the formula. R's formula interface interprets the entire expression swab_tibble$tumor * swab_tibble$bodysite as a single, complex predictor variable. Because adonis2 only sees one predictor, it lumps all the explained variance into the "Model" line.
Try this:
adonis2(
formula = bray_curtis_distance_matrix ~ tumor * bodysite,
data = swab_tibble
)
Here, these predictor columns are read implicitly from inside the data frame swab_tibble, but now the formula knows to treat them as separate variables. You should see both of them, and their interaction effect, in the output table.
I'm doing the same thing! Reading in the bray curtis qza and extracting distance matrix by using bray_dist$data.
I had the exact same issue with adonis2 only giving the full model significance, but adding the formula and data explicitly sadly didn't fix things in my case. However, I found that adding by = "terms" to the equation fixed the issue for me:
PERMANOVA_terms <- adonis2(formula = bray_dist ~ Sex*Genotype,
data = sample_data,
permutations = 999,
by = "terms")
Just wanted to add to this thread in case someone else has the same issue!
It looks like by = "terms"should be the default option, so maybe we found a bug?
data(dune)
data(dune.env)
## default test by terms
adonis2(dune ~ Management*A1, data = dune.env)
## overall tests
adonis2(dune ~ Management*A1, data = dune.env, by = NULL)
Interesting!
May I ask a follow-up question:
When using by = terms the order or the arguments in the formula matters (because of the extent to which the first variable explains diversity.)
But when using by = making the order of the arguments matters much less and I especially cannot "manipulate" the level of significance by reordering the formula (which is a good thing, I guess.)
So why would anyone want to use by = terms instead of by = margin?