Emily_Yu
(Emily Yu)
April 7, 2024, 5:07pm
1
Hi, when I am using "--p-formula" in ANCOM-BC, I was wondering if the order of the variables matter? For example, if my variable of interest is sex and I want to add body-site and treatment as covariates, is there any difference between:

--p-formula "Sex+Body-site+Treatment"
--p-formula "Sex+Treatment+Body-site"
--p-formula "Treatment+Body-site+Sex"
--p-formula "Body-site+Treatment+Sex"
1 Like

It's such a good question!

This page discibes the formula inputs as 'fixed-effects' which implies independant, but that's ANCOM-BC2
ANCOM-BC2 Tutorial

The page for ANCOM-BC does not mention fixed effects at all... ANCOM-BC Tutorial

```
```
formula = formula, group = group,
struc_zero = struc_zero, global = global,
pairwise = FALSE, dunnet = FALSE,
mdfdr_control = NULL, trend = FALSE, trend_control = NULL)
meta_data = qc$meta_data
global = qc$global
# Add pseudocount (1) and take logarithm.
y = log(feature_table + 1)
options(na.action = "na.pass") # Keep NA's in rows of x
x = model.matrix(formula(paste0("~", formula)), data = meta_data)
options(na.action = "na.omit") # Switch it back
covariates = colnames(x)
n_covariates = length(covariates)
# 2. Identify taxa with structural zeros
if (struc_zero) {
if (is.null(group)) {
stop_txt = paste("Please specify the group variable for",
"detecting structural zeros.",
"Otherwise, set struc_zero = FALSE to proceed")

I think the function `stats::model.matrix()`

ignores order. Can't find stuff on that either!

1 Like

I tried running this with data from PD-mice

In this example, donor and genotype are fully blocked and perfectly balanced:

```
$ cut -f 4,6 metadata.tsv | sort | uniq -c
1 categorical categorical
1 genotype donor
12 susceptible hc_1
12 susceptible pd_1
12 wild type hc_1
12 wild type pd_1
```

```
qiime composition ancombc \
--i-table table_2k_abund.qza \
--m-metadata-file metadata.tsv \
--p-formula 'donor + genotype' \
--o-differentials ancombc_donor_first.qza
qiime composition ancombc \
--i-table table_2k_abund.qza \
--m-metadata-file metadata.tsv \
--p-formula 'genotype + donor' \
--o-differentials ancombc_genotype_first.qza
# Then qiime composition da-barplot to make these:
```

ancombc_genotype_first.qzv (222.0 KB)
ancombc_donor_first.qzv (222.0 KB)

On first inspection, these look the same...

Remember that

this study is fully blocked (no confounding factors) and
all cohorts are balanced (n = 12 for subgroups)
which is cleaner than most real studies!
What I don't have is a citation that says formula order doesn't matter.

Remember that different packages run different tests, so formula order can absolutely matter!

1 Like

system
(system)
Closed
May 13, 2024, 11:22am
11
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.