Hi,
I'm trying to figure out which version of ancombc I should use.
I have shotgun metagenomics data, processed with metphlan. The variable I'm interested in is "group", it has 4 levels, one of them is HC, healthy controls, that I want to use as reference. The most important confounding factors seem to be bmi and gender. Number of reads as well, but qiime version of ancombc doesn't take the continuous variable (it is a metadata column with mapped reads), and maybe ancombc already inherently takes care of the difference of sequencing depth by sample..
First I ran on qiime 2, I converted the bmi score to bmi_group with 4 categories
qiime composition ancombc
--i-table filt_tab.qza
--m-metadata-file metadata1507_noMCS.csv
--p-formula 'bmi_group + gender + group'
--p-reference-levels group::HC
--o-differentials ./ancombc_group_conf.qza
qiime composition da-barplot
--i-data ./ancombc_group_conf.qza
--p-significance-threshold 0.05
--o-visualization da_barplot_confound.qzv
Resulting 2 significant species for group A, 6 for group B and 0 for group C.
Then I ran
tse = mia::makeTreeSummarizedExperimentFromPhyloseq(pseq_original)
tse$group = factor(tse$group, levels = c("HC", "A", "B","C"))
print(tse)
set.seed(123)
out = ancombc(data = tse, assay_name = "counts",
tax_level = "species", phyloseq = NULL,
formula = "pm_bmi + gender + group",
p_adj_method = "holm", prv_cut = 0.10, lib_cut = 0,
group = "group", struc_zero = TRUE, neg_lb = FALSE,
tol = 1e-5, max_iter = 100, conserve = TRUE,
alpha = 0.05, global = TRUE, n_cl = 1, verbose = TRUE)
res_prim = out$res
res_global = out$res_global
head(res_global)
In res_global there are 23 species with q_val under 0.05, but I guess they are all the ones that are different between any of the groups.
Interestingly, when subsampling to only include 2 groups in the comparison, I get a warning :
'ancombc' has been fully evolved to 'ancombc2'.
Explore the enhanced capabilities of our refined method!
Obtaining initial estimates ...
Estimating sample-specific biases ...
ANCOM-BC primary results ...
Merge the information of structural zeros ...
Note that taxa with structural zeros will have 0 p/q-values and SEs
Warning message:
**The group variable has < 3 categories **
The multi-group comparisons (global/pairwise/dunnet/trend) will be deactivated
And the res_global is empty.
What confuses me is that in qiime ancombc I couldn't use continuous variable (like the pm_bmi), but I could have more than 2 groups and get results for these groups compared to reference group. However in Rstudio with ancombc I could give continuous pm_bmi without problems, but not really get per group results.
Ancombc2 with Dunnett test could be interesting, but there I got over 300 significant taxa per group, which doesn't make much sense (total 649 taxa after 0.1 prevalence cut off, human gut microbiome). With Maaslin2 I had around 20 signicantly different species between the groups and reference ...
output = ancombc2(data = tse, assay_name = "counts", tax_level = "species",
fix_formula = "pm_bmi+ gender + group", rand_formula = NULL,
p_adj_method = "BH", pseudo_sens = TRUE,
prv_cut = 0.10, lib_cut = 0, s0_perc = 0.05,
group = "group", struc_zero = TRUE, neg_lb = TRUE,
alpha = 0.05, n_cl = 2, verbose = TRUE,
global = TRUE, pairwise = TRUE, dunnet = TRUE,
iter_control = list(tol = 1e-2, max_iter = 20,
verbose = TRUE),
em_control = list(tol = 1e-5, max_iter = 100),
lme_control = lme4::lmerControl(),
mdfdr_control = list(fwer_ctrl_method = "BH", B = 100))
If anyone could enlighten me
- why the ancombc in qiime vs Rstudio run differently (dealing with continuous variables and more than 2 groups),
- whether ancombc takes care of differing sequencing depth by sample, without this variable being in the formula
- why ancombc2 might be giving a very long list of significantly different taxa
Thank you in advance!!!