Obtain normalized data - ANCOM

JeremyTournayre · February 1, 2022, 12:54pm

Hello,

Is it possible to get the normalized data after the normalization by ANCOM please?

Thanks in advance,
Jérémy Tournayre

jwdebelius · February 1, 2022, 2:15pm

The data normalization is in the new ANCOM-BC. Unfortunately, the ANCOM-BC QIIME2 plugin isn't live, yet. You can use the R library directly, and import the data into QIIME 2 if you want to work with it here.

Best,
Justine

JeremyTournayre · February 2, 2022, 10:00am

Thanks!

I would like to make boxplots for each genus/species of the significant results of ANCOM.
If I don't have the normalized data I can't show boxplots because I only have the "raw/count" data before ANCOM (table.qza), right? How can you show the results of ANCOM, maybe nobody showed boxplots?

I never use ANCOM-BC: ANCOM-BC Tutorial

Just to be sure, the normalized data seems to be the matrix 'log_obs_abn_adj' obtained in "3.47 Bias-adjusted abundances", right?

They show a boxplot in "3.48 Visualizations for “age”", I think the log fold change is the coefficient of the normalized data, right?

I think I have confirmed that by doing a plot of their data:

fit1 <- lm(log_obs_abn_adj[1,]~sample_data(pseq)$age)
plot(log_obs_abn_adj[1,]~sample_data(pseq)$age)
abline(fit1)

image467×421 9.16 KB

print(fit1)
Call:
lm(formula = log_obs_abn_adj[1, ] ~ sample_data(pseq)$age)

Coefficients:
(Intercept) sample_data(pseq)$age
6.10764 -0.03473

This number "-0.03473" is the coefficient obtained by ANCOM-BC

To continue, I try this logic on the "bmi_group" variable (not continous like age)

fit1 <- lm(log_obs_abn_adj[1,]~sample_data(pseq)$bmi_group )
print(fit1)

Call:
lm(formula = log_obs_abn_adj[1, ] ~ sample_data(pseq)$bmi_group)

Coefficients:
(Intercept) sample_data(pseq)$bmi_groupoverweight sample_data(pseq)$bmi_groupobese
4.8029 -0.1122 -0.7992

The numbers " -0.1122" and "-0.7992" are not the coefficients obtained by ANCOM-BC, maybe this is not the lm function which permit to obtain the "0.10" and "-0.25" of ANCOM BC.

Nethertheless I think I can show the normalized data with:
plot(log_obs_abn_adj[1,]~sample_data(pseq)$bmi_group)

The unormalized/raw/count data:
plot(log(abundances(phylum_data)[1,],2)~sample_data(pseq)$bmi_group)

Can someone confirm the normalized data is the matrix 'log_obs_abn_adj' obtained in "3.47 Bias-adjusted abundances" please?

Have a nice day, Jérémy Tournayre

jwdebelius · February 3, 2022, 3:45pm

Hi @JeremyTournayre,

I misunderstood and I apologize - you're looking for a transform for ANCOM I (in QIIME). There isn't a good exact representative table, I think the closely you can come is a centered log ratio (CLR) transform. I'm not sure if we have one implemented. As far as I know, there isn't a classic CLR implemented in QIIME (the RCLR in gemilli is slightly different. As in the ANCOM-BC table.) So, if I were doing boxplots, I would add a pseudocount and then apply an CLR.

If you want to use the ANCOM-BC table, I would recommend pairing it with the ANCOM-BC test.

I apologize for the confusion.

Best,
Justine

JeremyTournayre · February 4, 2022, 7:49am

Hello,

No it's ok, I probably don't send enought information in my first post :).

I'm suprised that "nobody" asking to have the normalized data by ANCOM to at least see what ANCOM do on the data. Specially because it is based on the 25% of features or taxon which are no differentially abundant between the groups sometime this can lead to a failure, I think.

With the new ANCOM-BC they don't speak about the normalization with 25% of features no abundantly different. I try to retrieve this information in their ANCOM-BC paper Analysis of compositions of microbiomes with bias correction | Nature Communications I think this is it:

"ANCOM-BC accounts for sampling fraction by introducing a sample-specific offset term in a linear regression framework, that is estimated from the observed data. The offset term serves as the bias correction, and the linear regression framework in log scale is analogous to log-ratio transformation to deal with the compositionality of microbiome data. The case of zero counts is also discussed in “Methods” section. This methodology has some conceptual similarities with DR, but is fundamentally different. "

I think I will use ANCOM-BC.
Just to be sure, when you said "the ANCOM-BC table, I would recommend pairing it with the ANCOM-BC test.", does that mean I can use the normalized table of ANCOM-BC (log_obs_abn_adj) to display the data and to see if there are a difference I can use the p-value of the statistical test done by ANCOM-BC (instead of my own calculation), right?

Have a nice day :qiime2:!

jwdebelius · February 4, 2022, 2:58pm

Hi @JeremyTournayre,

The ANCOM normalization is built into the test; it essentially runs a series of pairwise comparisons and then the W is the sum of significant pairwise comparisons. So, the underlying transform is an additive log ratio between the pairwise taxa. Displaying ALR requires picking a reference group. So, it's slightly more complicated than just asking to see the data transform under the test.

The assumption that fewer than 25% of the taxa are changing I think comes from the significance selection for the W, which I'd argue against in general because I know far too many people whose selected W values don't hold up to scrutiny.

That would be my recommendation.

Best,
Justine

system · March 7, 2022, 8:58pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.