Does ANCOM-BC on qiime2 corrects for covariates?

Liviacmg · July 16, 2024, 7:34pm

Hey guys,

I wanted to know if ANCOM-BC inside qiime2 corrects for covariable effect the same way as ANCOM-BC 2 (performed in R) does. I'm asking this because I have run ANCOM-BC for a dataset in which, considering q-value < 0.2, only two taxa resulted in the output. In the same pace as performing ANCOM-BC 2 in R, using fix_formula to correct for other covariables resulted in 27 taxa (with q-value < 0.05, FDR-corrected). Can someone help me understand why the results are so divergent?

Thanks in advance.

salias · July 17, 2024, 9:06am

Hi there,

ANCOM-BC does not correct for covariates by default. The --p-formula option in QIIME 2 ANCOM-BC is equivalent to fix_formula in R. So you can use there the same formula you are using in R.

Cheers!

Sergio

Liviacmg · July 17, 2024, 1:53pm

Hi @salias,

Thank you for the quick reply. Although, does --p-formula receives as input numeric covariates (such as age, etc)? Because in ANCOM-BC 2 in R I used the formula "fix_formula = "Age_in_years + Hypertension + Physical_activity + Smoke + Years_of_illness + Treatment", but when trying to apply the same in qiime2 version 2023.5 the following error appears:

(Note 1: How can ANCOM-BC 2 accept numeric values and ANCOM-BC in qiime2 does not?)

(Note 2: Also, after running a test in ANCOM-BC 2 and ANCOM-BC using only the covariate 'Treatment' for example, in ANCOM-BC in qiime2 only 1 taxon returned and in ANCOM-BC 2, 19 taxa returned in output. How can so divergent taxa return?)

#Command used:
qiime composition ancombc
--i-table genus.qza
--m-metadata-file q2-metadata.tsv
--p-formula 'Age_in_years + Hypertension + Physical_activity + Smoke + Years_of_illness + Treatment'
--o-differentials ancombc.qza

Error:

Plugin error from composition:

An error was encountered while running ANCOM-BC in R (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-0ocql18p.log

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_ancombc.R --inp_abundances_path /tmp/tmphqhumvfv/input.biom.tsv --inp_metadata_path /tmp/tmphqhumvfv/input.map.txt --md_column_types {"Register_number": "numeric", "Sample_name": "categorical", "Diagnosis": "numeric", "sub-diagnosis": "numeric", "Treatment": "categorical", "Gender": "numeric", "Age_in_years": "numeric", "Age": "categorical", "Age_group": "categorical", "Age_at_onset": "numeric", "Years_of_illness": "numeric", "N_of_depressive_episodes": "numeric", "N_of_manic_episodes": "numeric", "N_of_mixed_episodes": "numeric", "total_number_of_episodes": "numeric", "suicide_attempts": "numeric", "Alda_scale": "numeric", "Li_response": "numeric", "VP_response": "numeric", "years_of_lithium_treratment": "numeric", "years_of_valproic_treatment": "numeric", "VP_response_total_score": "numeric", "response_to_lamotrigine": "numeric", "BMI": "categorical", "BMI_group": "categorical", "Physical_activity": "categorical", "Smoke": "categorical", "mood_stabilizers_other_than_lithium": "numeric", "Antipsychotic_treatment": "numeric", "Antidepressants_treatment": "numeric", "Hypertension": "numeric", "Diabetes": "numeric", "Aderenza_scala_predimed": "numeric", "valori_scala_predimed": "numeric"} --formula Age_in_years + Hypertension + Physical_activity + Smoke + Years_of_illness + Treatment --p_adj_method holm --prv_cut 0.1 --lib_cut 0 --reference_levels ['Physical_activity::0', 'Smoke::0', 'Treatment::Li-treated', 'Age_in_years::25.0', 'Hypertension::0.0', 'Years_of_illness::2.0'] --neg_lb False --tol 1e-05 --max_iter 100 --conserve False --alpha 0.05 --output_loaf /tmp/q2-DataLoafPackageDirFmt-a3jkpftj

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
dplyr 1.1.2 readr 2.1.4
forcats 1.0.0 stringr 1.5.0
ggplot2 3.4.2 tibble 3.2.1
lubridate 1.9.2 tidyr 1.3.0
purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
dplyr::filter() masks stats::filter()
dplyr::lag() masks stats::lag()
Use the conflicted package (http://conflicted.r-lib.org/) to force all conflicts to become errors

Attaching package: ‘jsonlite’

The following object is masked from ‘package:purrr’:

flatten

R version 4.2.3 (2023-03-15)
Error in relevel.factor(md[[column]], ref = intercept_vector) :
'ref' must be an existing level
Calls: relevel -> relevel.factor
3: stop("'ref' must be an existing level")
2: relevel.factor(md[[column]], ref = intercept_vector)
1: relevel(md[[column]], ref = intercept_vector)
Traceback (most recent call last):
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 255, in _ancombc
run_commands([cmd])
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 32, in run_commands
subprocess.run(cmd, check=True)
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_ancombc.R', '--inp_abundances_path', '/tmp/tmphqhumvfv/input.biom.tsv', '--inp_metadata_path', '/tmp/tmphqhumvfv/input.map.txt', '--md_column_types', '{"Register_number": "numeric", "Sample_name": "categorical", "Diagnosis": "numeric", "sub-diagnosis": "numeric", "Treatment": "categorical", "Gender": "numeric", "Age_in_years": "numeric", "Age": "categorical", "Age_group": "categorical", "Age_at_onset": "numeric", "Years_of_illness": "numeric", "N_of_depressive_episodes": "numeric", "N_of_manic_episodes": "numeric", "N_of_mixed_episodes": "numeric", "total_number_of_episodes": "numeric", "suicide_attempts": "numeric", "Alda_scale": "numeric", "Li_response": "numeric", "VP_response": "numeric", "years_of_lithium_treratment": "numeric", "years_of_valproic_treatment": "numeric", "VP_response_total_score": "numeric", "response_to_lamotrigine": "numeric", "BMI": "categorical", "BMI_group": "categorical", "Physical_activity": "categorical", "Smoke": "categorical", "mood_stabilizers_other_than_lithium": "numeric", "Antipsychotic_treatment": "numeric", "Antidepressants_treatment": "numeric", "Hypertension": "numeric", "Diabetes": "numeric", "Aderenza_scala_predimed": "numeric", "valori_scala_predimed": "numeric"}', '--formula', 'Age_in_years + Hypertension + Physical_activity + Smoke + Years_of_illness + Treatment', '--p_adj_method', 'holm', '--prv_cut', '0.1', '--lib_cut', '0', '--reference_levels', "['Physical_activity::0', 'Smoke::0', 'Treatment::Li-treated', 'Age_in_years::25.0', 'Hypertension::0.0', 'Years_of_illness::2.0']", '--neg_lb', 'False', '--tol', '1e-05', '--max_iter', '100', '--conserve', 'False', '--alpha', '0.05', '--output_loaf', '/tmp/q2-DataLoafPackageDirFmt-a3jkpftj']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in call
results = action(**arguments)
File "", line 2, in ancombc
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
outputs = self.callable_executor(
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 509, in callable_executor
output_views = self._callable(**view_args)
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 41, in ancombc
return _ancombc(
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 257, in _ancombc
raise Exception('An error was encountered while running ANCOM-BC'
Exception: An error was encountered while running ANCOM-BC in R (return code 1), please inspect stdout and stderr to learn more.

So I tried to specify the --p-reference-levels just to see if something would change:

#Command used:
qiime composition ancombc
--i-table genus.qza
--m-metadata-file q2-metadata.tsv
--p-formula 'Age_in_years + Hypertension + Physical_activity + Smoke + Years_of_illness + Treatment'
--p-reference-levels Age_in_years::Elderly Hypertension::0 Physical_activity::0 Smoke::0 Years_of_illness::34 Treatment::Other.stabilizers \
--o-differentials ancombc.qza

Error:
Plugin error from composition:

One of the reference_levels columns is not a categorical Metadata column. Please make sure that all chosen reference level columns are categorical, and not numeric. Non-categorical column selected: Age_in_years

Debug info has been saved to /tmp/qiime2-q2cli-err-73fwgc48.log

Traceback (most recent call last):
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in call
results = action(**arguments)
File "", line 2, in ancombc
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
outputs = self.callable_executor(
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 509, in callable_executor
output_views = self._callable(**view_args)
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 41, in ancombc
return _ancombc(
File "/root/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 175, in _ancombc
raise TypeError('One of the reference_levels columns is not'
TypeError: One of the reference_levels columns is not a categorical Metadata column. Please make sure that all chosen reference level columns are categorical, and not numeric. Non-categorical column selected: Age_in_years

Thank you again.

salias · July 17, 2024, 2:48pm

Hi again,

I've never used ANCOM-BC for numerical columns, only categorical, so take my words with a grain of salt. Regarding the first error, this is the (most) interesting part:

I'm not sure about what that means. I did a quick Google search and found nothing but the obvious: R is looking for a factor that apparently does not exist. Weird, because you are not setting the reference levels and therefore it is selected directly from data, so it should be there (at least once).

Things I can think of that could cause problems here (again, this is a shot in the dark so take it with a grain of salt):

I can see here columns like Physical_activity or Smoke that are categorical in the metadata file but they seem to be numerical if we look at default reference values assigned:

Another thing. Maybe the hyphen in:

is causing trouble? Maybe a senior member can shed some light on the matter.

Maybe ANCOM-BC in QIIME 2 has different default values for some parameters than the R ANCOM-BC? Here you can find those default values.

Regarding the second error:

Maybe you meant Age or Age_group?

I think you don't need to set the reference level for numerical columns, only for categorical.

lizgehret · July 17, 2024, 4:34pm

Hi @Liviacmg,

Just wanted to jump in here with a few additional details. The ANCOM-BC method that lives in q2-composition is a wrapper around the ANCOM-BC R package - which is not the same type of model as ANCOM-BC2. ANCOM-BC is a gaussian mixture model and only handles fixed effects, and ANCOM-BC2 is a linear mixed effects model that handles both fixed and random effects. Additionally, ANCOM-BC only accounts for sample-specific bias, while ANCOM-BC2 also accounts for taxon-specific bias.

You can check out the different methodologies here as well, if you're interested - this shows the exact calculations for each (directly from the R package):

ANCOM-BC
ANCOM-BC2

All that to say - depending on what type of data you're dealing with (i.e. fixed effects only or mixed effects, continuous covariates, multi-group comparisons, etc), you should choose which method is most appropriate. Direct comparisons between the two models won't be the same, and won't necessarily provide you with insightful results.

ANCOM-BC2 isn't currently implemented in QIIME 2 yet - but we are planning to add this to q2-composition in our upcoming release (2024.10), so that should be available soon.

With regards to the error you're receiving, it is because categorical columns are expected in the reference levels (these categorical factors are dummy coded relative to the reference(s) provided). If the numeric columns you're using are discrete vs. continuous, you could add a comment directive in your metadata file to specify those columns as categorical rather than numeric - but depending on the contents of these columns, you may not receive an output that makes a ton of sense (i.e. if these numeric columns don't have much overlap between samples in your metadata).

Hope this helps! Cheers

Liviacmg · July 18, 2024, 11:39pm

Hi @lizgehret ,

Thank you so much for your insightful response!

Also, I was checking the inputs for both ANCOM-BC and ANCOM-BC2 and in the first one I used qiime taxa collapse (which is also recommended in qiime2 tutorials, in order to look into the genus level), which led me to think that maybe the differences between ANCOM-BC and ANCOM-BC2 could maybe also potentially be because in the first one the ASV/OTU table is collapsed while in the second one it isn't collapsed (it needs to be a phyloseq object, so I used the ASV table, metadata and taxonomy isolated) or do you think this is not responsible to affect the output?

Thank you so much again.

Liviacmg · July 19, 2024, 7:42pm

Hi @lizgehret ,

Sorry for having another question haha

But do you know if in ANCOM-BC the correction for the covariates is carried out only on the significant taxa after the FDR correction, or on all the taxa present in the database (OTU table)?

Thank you in advance.

Mike_Stevenson · July 21, 2024, 5:06pm

Hi @Liviacmg - I found this paper (see below) and a direct quote from it stating that it is for each taxa. I’ve emboldened it in the excerpt below. I hope it helps!

Reference:

Lin, H., & Peddada, S. D. (2020). Analysis of compositions of microbiomes with bias correction. Nature Communications, 11, 3514. Analysis of compositions of microbiomes with bias correction | Nature Communications

Direct Excerpt from the Paper:

"We consider a log-linear regression model with taxon-specific bias terms to account for differences in sequencing depth, compositionality, and other technical effects. Specifically, for each taxon (i), we fit the model (\log(y_{ij}) = \beta_{0i} + \mathbf{X}j\beta_i + \epsilon{ij}), where (y_{ij}) is the observed count for taxon (i) in sample (j), (\mathbf{X}j) are the covariates, (\beta{0i}) and (\beta_i) are the coefficients to be estimated, and (\epsilon_{ij}) is the error term."

lizgehret · July 22, 2024, 10:46am

Hi @Liviacmg,

@Mike_Stevenson is correct with regards to his answer to your second question!

Yes, this will also impact the results - both ANCOM-BC and ANCOM-BC2 will be performed at the lowest taxonomic level available in the input table (unless otherwise specified).

You can read more on how the tax_level parameter is used in both methods below if you're interested:

ANCOM-BC
ANCOM-BC2

Hope this helps! Cheers

Liviacmg · July 23, 2024, 4:02pm

Hi @Mike_Stevenson ,

Thank you so much!!

Liviacmg · July 23, 2024, 4:02pm

Hi @lizgehret ,

Thank you again!!

Liviacmg · July 23, 2024, 6:08pm

Hi @lizgehret and @Mike_Stevenson ,

I know ANCOM-BC 2 is still not implemented in qiime2 but do you happen to know in relation to which group this graphic from ANCOM-BC 2 is plotted? For example, in the following image from the tutorial (ANCOM-BC2 Tutorial):

It does not say in relation to which group the orange taxa are increased and the blue taxa are decreased. I am asking because I performed the same analysis but with treatment - having two groups, control (who use other stabilizers) and the ones who use lithium - . I have setted patients who use other stabilizers manually as reference, as it says below in the tutorial:

(But I don't know which group it is referring to in the increased/decreased taxa).

" ```
Note that by default, levels of a categorical variable in R are sorted
alphabetically. In this case, the reference level for bmi will be
lean. To manually change the reference level, for instance, setting obese
as the reference level, use:
tse$bmi = factor(tse$bmi, levels = c("obese", "overweight", "lean"))


I did:
tse$bmi = factor(tse$Treatment, levels = c("Other.stabilizers", "Li-treated"))

Thank you once again!

lizgehret · July 24, 2024, 5:06pm

Hey @Liviacmg,

That's a great question! The intercept or reference shown in this plot (and what all the LFC groups are with respect to in ANCOM-BC/BC2 results) will be the first ordered level. So if you didn't specify the level ordering, it would default to the first group in alphabetical order (just as you mention for lean in the bmi group). Since you specified your level order as Other.stabilizers and Li-treated, you should see the Other.stabilizers as the reference.

Hope this helps! Cheers

Liviacmg · July 24, 2024, 6:41pm

Thank you so much again @lizgehret ,

So to confirm that I understood correctly: (in my analysis) the orange bars are the taxa increased in patients treated with other stabilizers and the blue bars the decreased taxa in patients treated with other stabilizers?

Best regards

lizgehret · July 24, 2024, 8:00pm

Hi @Liviacmg,

Not quite - what this waterfall plot is showing is the relative enrichment/depletion of each taxon with respect to the abundances found in the chosen reference group (which in your analysis would be Other.stabilizers).

Cheers

lizgehret · July 30, 2024, 4:48pm

An off-topic reply has been split into a new topic: how to correct for read depth when using ancombc

Please keep replies on-topic in the future.