Does the order of variables matter in ANCOM-BC

colinbrislawn · April 12, 2024, 5:22am

I tried running this with data from PD-mice

In this example, donor and genotype are fully blocked and perfectly balanced:

$ cut -f 4,6 metadata.tsv | sort | uniq -c
      1 categorical     categorical
      1 genotype        donor
     12 susceptible     hc_1
     12 susceptible     pd_1
     12 wild type       hc_1
     12 wild type       pd_1

qiime composition ancombc \
  --i-table table_2k_abund.qza \
  --m-metadata-file metadata.tsv \
  --p-formula 'donor + genotype' \
  --o-differentials ancombc_donor_first.qza

qiime composition ancombc \
  --i-table table_2k_abund.qza \
  --m-metadata-file metadata.tsv \
  --p-formula 'genotype + donor' \
  --o-differentials ancombc_genotype_first.qza

# Then qiime composition da-barplot to make these:

ancombc_genotype_first.qzv (222.0 KB)
ancombc_donor_first.qzv (222.0 KB)

On first inspection, these look the same...

Remember that

this study is fully blocked (no confounding factors) and
all cohorts are balanced (n = 12 for subgroups)
which is cleaner than most real studies!

What I don't have is a citation that says formula order doesn't matter.

Remember that different packages run different tests, so formula order can absolutely matter!