ANCOMBC interpretation

Hey guys,

I wanted some help to understand the ANCOMBC output. I am using 2023.5 installed through conda. The command used:

qiime composition ancombc
--i-table genus.qza
--m-metadata-file sample-metadata.tsv
--p-formula Host_disease
--p-reference-levels Host_disease::Control
--p-p-adj-method fdr
--o-differentials ancombc-fdr-adjusted.qza

Why did it result in these two comparisons: Host_disease (what is it comparing against control?) and Host_diseaseCase? In the Host_disease column there are only two values: case and control. Isn't it expected to compare only case vs control once I have setted the Control as the reference level? And why in the image the title isn't Host_diseaseControl, and instead is Host_diseaseCase (or only Host_disease)? Even if the title is Host_diseaseCase the taxons enriched and depleted are doing reference to what, the control or the case?

I understand that Host_diseaseCase is the comparison between control vs case, is it correct? And Host_disease is comparing what to control?

Thank you in advance; ANCOMBC still bugs my mind sometimes, when it generates these extra comparisons that doesn't seem to make sense to me.

image

image

2 Likes

Hello!
Looks strange to me!
I would expect only one link, "Host_diseaseCase," with the case compared to the control.
In your metadata file, do you have any columns that have no values? It is really strange but so far I have no other ideas. If you have empty samples (regarding that column), could you filter them out from both the feature table and metadata files and run them again?
If any of the users/mods reading this post have other ideas, please join the discussion.

Here Case group is compared to the Control, with enriched features more abundant in Case and depleted - more abundant in Control.

That is a mystery to me. If you have empty columns then they (probably :thinking:) were compared to the control.

Best,

2 Likes

I checked the metadata again and there were blank spaces as well, my bad! So it is exactly what you said!

So, in the graphic Host_diseaseCase the blue bars that represent the enriched taxa are abundant in Case, and at the same time depleted in control? And the orange bars that represent the depleted, represent taxa depleted in Case, and at the same time abundant in control?

1 Like

Yes, that is right. Features with positive x-axis values and blue colors more abundant in Case compared to control, or less abundant in control, and orange bars (negative x-values) are less abundant in Case compared to control, or more abundant in control.

1 Like