I tried running this with data from PD-mice
In this example, donor and genotype are fully blocked and perfectly balanced:
$ cut -f 4,6 metadata.tsv | sort | uniq -c
1 categorical categorical
1 genotype donor
12 susceptible hc_1
12 susceptible pd_1
12 wild type hc_1
12 wild type pd_1
qiime composition ancombc \
--i-table table_2k_abund.qza \
--m-metadata-file metadata.tsv \
--p-formula 'donor + genotype' \
--o-differentials ancombc_donor_first.qza
qiime composition ancombc \
--i-table table_2k_abund.qza \
--m-metadata-file metadata.tsv \
--p-formula 'genotype + donor' \
--o-differentials ancombc_genotype_first.qza
# Then qiime composition da-barplot to make these:
ancombc_genotype_first.qzv (222.0 KB)
ancombc_donor_first.qzv (222.0 KB)
On first inspection, these look the same...
Remember that
- this study is fully blocked (no confounding factors) and
- all cohorts are balanced (n = 12 for subgroups)
- which is cleaner than most real studies!
What I don't have is a citation that says formula order doesn't matter.
Remember that different packages run different tests, so formula order can absolutely matter!