ancombc in qiime2 vs rstudio vs ancombc2

rahel_park · July 30, 2024, 9:12pm

Hi,
I'm trying to figure out which version of ancombc I should use.
I have shotgun metagenomics data, processed with metphlan. The variable I'm interested in is "group", it has 4 levels, one of them is HC, healthy controls, that I want to use as reference. The most important confounding factors seem to be bmi and gender. Number of reads as well, but qiime version of ancombc doesn't take the continuous variable (it is a metadata column with mapped reads), and maybe ancombc already inherently takes care of the difference of sequencing depth by sample..

First I ran on qiime 2, I converted the bmi score to bmi_group with 4 categories

qiime composition ancombc
--i-table filt_tab.qza
--m-metadata-file metadata1507_noMCS.csv
--p-formula 'bmi_group + gender + group'
--p-reference-levels group::HC
--o-differentials ./ancombc_group_conf.qza

qiime composition da-barplot
--i-data ./ancombc_group_conf.qza
--p-significance-threshold 0.05
--o-visualization da_barplot_confound.qzv

Resulting 2 significant species for group A, 6 for group B and 0 for group C.

Then I ran
tse = mia::makeTreeSummarizedExperimentFromPhyloseq(pseq_original)

tse$group = factor(tse$group, levels = c("HC", "A", "B","C"))
print(tse)
set.seed(123)
out = ancombc(data = tse, assay_name = "counts",
tax_level = "species", phyloseq = NULL,
formula = "pm_bmi + gender + group",
p_adj_method = "holm", prv_cut = 0.10, lib_cut = 0,
group = "group", struc_zero = TRUE, neg_lb = FALSE,
tol = 1e-5, max_iter = 100, conserve = TRUE,
alpha = 0.05, global = TRUE, n_cl = 1, verbose = TRUE)

res_prim = out$res
res_global = out$res_global
head(res_global)

In res_global there are 23 species with q_val under 0.05, but I guess they are all the ones that are different between any of the groups.

Interestingly, when subsampling to only include 2 groups in the comparison, I get a warning :

'ancombc' has been fully evolved to 'ancombc2'.
Explore the enhanced capabilities of our refined method!
Obtaining initial estimates ...
Estimating sample-specific biases ...
ANCOM-BC primary results ...
Merge the information of structural zeros ...
Note that taxa with structural zeros will have 0 p/q-values and SEs
Warning message:
**The group variable has < 3 categories **
The multi-group comparisons (global/pairwise/dunnet/trend) will be deactivated

And the res_global is empty.

What confuses me is that in qiime ancombc I couldn't use continuous variable (like the pm_bmi), but I could have more than 2 groups and get results for these groups compared to reference group. However in Rstudio with ancombc I could give continuous pm_bmi without problems, but not really get per group results.

Ancombc2 with Dunnett test could be interesting, but there I got over 300 significant taxa per group, which doesn't make much sense (total 649 taxa after 0.1 prevalence cut off, human gut microbiome). With Maaslin2 I had around 20 signicantly different species between the groups and reference ...

output = ancombc2(data = tse, assay_name = "counts", tax_level = "species",
fix_formula = "pm_bmi+ gender + group", rand_formula = NULL,
p_adj_method = "BH", pseudo_sens = TRUE,
prv_cut = 0.10, lib_cut = 0, s0_perc = 0.05,
group = "group", struc_zero = TRUE, neg_lb = TRUE,
alpha = 0.05, n_cl = 2, verbose = TRUE,
global = TRUE, pairwise = TRUE, dunnet = TRUE,
iter_control = list(tol = 1e-2, max_iter = 20,
verbose = TRUE),
em_control = list(tol = 1e-5, max_iter = 100),
lme_control = lme4::lmerControl(),
mdfdr_control = list(fwer_ctrl_method = "BH", B = 100))

If anyone could enlighten me

why the ancombc in qiime vs Rstudio run differently (dealing with continuous variables and more than 2 groups),
whether ancombc takes care of differing sequencing depth by sample, without this variable being in the formula
why ancombc2 might be giving a very long list of significantly different taxa

Thank you in advance!!!

lizgehret · August 2, 2024, 7:27pm

Hi @rahel_park,

Apologies for the delayed reply! I have been a bit backlogged this week but will get back to you as soon as possible. Any other mods are welcome to jump in in the meantime.

rahel_park · August 5, 2024, 6:14am

Thank you :), will be looking forward to !

lizgehret · August 13, 2024, 5:20pm

Hi @rahel_park,

Once again, apologies for the late reply - and thanks so much for your patience! Happy to address your questions below:

We chose not to support continuous variables because this can diminish the ability to make meaningful comparisons between groups. Variables with discrete numerical groupings are still appropriate to use, you would just need to identify that within your metadata as a categorical column using the comment directive.

Multiple (3+ groups) can be used within the QIIME 2 implementation of ancombc because the features mentioned in the warning you received (global/pairwise/dunnet/trend) are not implemented. It is interesting though - to my knowledge, those additional tests (pairwise, dunnet, trend) were only available within ancombc2.

For full disclosure, I do not have as deep a familiarity with ancombc2 as I do with ancombc - this is a method that we will be implementing into QIIME 2 this year, but it is a newer method that our core development team are still actively learning about for development.

It looks like @jwdebelius did a great job of addressing this question in your other post here, so I'll leave this as-is to help keep threads tidy

As I mentioned above, I can't speak with as much confidence regarding ancombc2 - however, I do wonder if this is because you've chosen a continuous variable in your formula. This is essentially the reason we've removed the ability to include continuous variables within the QIIME 2 implementation of ancombc. It can be tough to group continuous variables (since essentially everyone is in their own 'group') - which can produce groupings or significance that aren't really meaningful.

Hope this helps! Cheers

rahel_park · August 21, 2024, 8:24am

Thank you so much @lizgehret for the clarifications!
I will then just use the QIIME2 version of ancombc for now.

system · September 21, 2024, 2:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.