ANCOMBC2 in R Error: Estimation failed for the following covariates:

Sihan_Bu · August 6, 2024, 2:03pm

Hello,

Sorry, I know this isn't a QIIME2 question, but I have nowhere to find such excellent and responsive people on the internet.

I was running ANCOMBC2 in R but I got an error message.

kenaioutput<-ancombc2(data = phyloseq,
                      assay_name = "counts", 
                      tax_level = "Genus",
                      fix_formula =  "Lake_Collected_From + Sex_f_m_NA+fish_type + Mass_g+Std_length_mm+Surface_area_group+Prevalence",
                      rand_formula = NULL,
                      p_adj_method = "BH",
                      prv_cut = 0.10, 
                      group = "Lake_Collected_From", 
                      struc_zero = TRUE, 
                      neg_lb = FALSE,
                      alpha = 0.05, 
                      n_cl = 4, 
                      verbose = TRUE,
                      global = TRUE, 
                      pairwise = TRUE, 
                      dunnet = TRUE, 
                      trend = FALSE,
                      iter_control = list(tol = 1e-2, max_iter = 20, 
                                      verbose = TRUE))

Error message:

Error: Estimation failed for the following covariates:
fish_typelimnetic, Surface_area_group2, Surface_area_group3, Prevalence0.07, Prevalence0.14, Prevalence0.5
Consider removing these covariates

I found a thread talking about the same error on github. It looks like it's because of multicollinearity issue. However, there was no solution to it. I really need to include these problematic variables in my model as they are really important to my study. I'm thinking about two solutions.

My fish_type has two levels. I can make it into a dummy variable and create PCA values. However, ancombc2 is using log-linear regression. I didn't find evidence that PCA values can be used in log-linear regression. Similar to my Surface_area_group (3 levels) and Prevalence (4 levels). I will decrease them to 2 levels (I don't like this).
Instead, I use other models, such as MAaslin for multivariate analysis.

Any suggestions would be grateful!

Thank you so much for your help.

colinbrislawn · August 6, 2024, 10:34pm

Hello @Sihan_Bu,

Thank you for bringing your question to the forums! I also appreciate your contribution to this community through excellent questions!

Yes, I concure.

I've been there! While statistical covariance is hard, careful study design should help.

What is your underlying study design?

Are Prevalence0.07 and Prevalence0.14 taxa counts?

Here is what a third option might look like:

Begin the analysis with an all-vs-all metadata correlation to see what covaries.
Based on what you observe to covary, argue to the refs that 1) these factors are expected to covary and 2) you can focus on only a select few of these metadata categories for downstream analysis.
This justifies the decision to drop these discussed factors from the stats model, thus removing confounding and this error!

Of course, this only works if the simplified model still answers your biological question

system · September 7, 2024, 4:34am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.