ANCOM BC steps for anyalsis

Hi, can anyone guide me with differential abundance analysis using ANCOM BC package in R studio. I was able to make the tse object.

tse = mia::makeTreeSummarizedExperimentFromPhyloseq(physeq100rar)
tse$type = factor(tse$type, levels = c("case", "control"))

But when I do the following I get error

out = ancombc2(data = tse, assay_name = "counts",
tax_level = "species",
formula = "type",
p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000,
group = "type", struc_zero = TRUE, neg_lb = TRUE, tol = 1e-5,
max_iter = 100, conserve = TRUE, alpha = 0.05, global = TRUE,
n_cl = 1, verbose = TRUE)

res = out$res
res_global = out$res_global

Error in ancombc2(data = tse, assay_name = "counts", tax_level = "species", :
unused arguments (phyloseq = NULL, formula = "type", tol = 1e-05, max_iter = 100, conserve = TRUE)

I am following this tutorial (ANCOM-BC Tutorial)
Any suggestions on how to move forward

Hi, Komalk,

the error you got is quite straightforward: some arguments (formula, tol, max_iter, conserve) that you've used in the ancombc2 function are not recognised by it.

The reason comes from the (sometimes confusing) fact, that ancombc (the function referenced in your tutorial) and ancombc2 (the function you're using) are two separate functions.

So there are two ways to proceed:

  • Try using the ancombc function (just remove the "2", from your ancombc2 function).
  • Investigate the ANCOM-BC2 tutorial and use the examples and arguments provided there.

One more thing: the ancombc2 is the newer one of the two (and quite recently peer-reviewed and published). As the authors say:

Although the ANCOM-BC methodology accounted for sample-specific bias, for better control of FDR, ANCOM-BC2 also accounts for taxon-specific bias.

So the Bias Correction part may be more robust in the ANCOM-BC2 method and it should warrant your attention. :sparkles:

Good luck! :crossed_fingers:


Hello Justin, thank you for clearing the confusion. I am now following ANCOM-BC2 tutorial. I have a new error now

output = ancombc2(data = tse, assay_name = "counts", tax_level = NULL,

  •               fix_formula = "type", rand_formula = NULL,
  •               p_adj_method = "holm", pseudo_sens = TRUE,
  •               prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05,
  •               group = "type", struc_zero = FALSE, neg_lb = FALSE,
  •               alpha = 0.05, n_cl = 2, verbose = TRUE,
  •               global = FALSE, pairwise = TRUE, 
  •               dunnet = FALSE, trend = FALSE,
  •               iter_control = list(tol = 1e-5, max_iter = 20, 
  •                                   verbose = FALSE),
  •               em_control = list(tol = 1e-5, max_iter = 100),
  •               lme_control = NULL, 
  •               mdfdr_control = list(fwer_ctrl_method = "holm", B = 100), 
  •               trend_control = NULL)

Error: Sample size per group should be >= 2
Small sample size detected for the following group(s):

I searched the error message. From what I have understood is there is some issue with the group size. but my group(type 1 and type 2) have 64 and 30 samples respectively.

Kindly help.
Thank you

Hey, Komalk,

I would try the following steps:

  1. Run summary(tse@colData$type) to make sure that the output shows the group counts like you described.
  2. lib_cut = 1000 can discard samples if sample library sizes are small. You can try setting it to a smaller number (although your samples should really have at least that amount of reads, but try it anyway)
  3. Instinctively, I would also try setting pairwise to FALSE, just to see if the ANCOMBC2 runs.

If none of this works (or even if 2. or 3. works), I would inspect the tse object's tse@colData, rowData(tse) with dim, glimpse functions, just to see if the data has imported correctly from phyloseq.

1 Like

Hello JustinK, I really appreciate your time and help. I firstly tried your solution number three, it worked.
I then checked my coldata$type and yes you are right there is no other variable( type-2 samples were not exported), Although my physeq object has both types of sample. I searched about it and found nothing to my understanding. For now I am using another column which has two variable and it worked, got the output. Will see if I can actually get further outputs using the tutorial till end. :sunny:

Thank you for solutions.