understanding ancombc in qiime2

Hello,
I am running ancombc in version 2024.2 of qiime2 in a conda environment. I want to know what taxa are differentially abundant between body site "A" and body sites "B,C,D, and E". I made a seperate column in my metadata so that I can compare body site A to all other body sites which are called "other" in the new column. So, this new column only has "A" or "other". I have 2 specific questions about the output of this:

  1. When it shows what is enriched vs depeleted in the DA plot, and I want to know specifically what is more enriched in group A vs other (because group A is the body site of interest), would the taxa considered enriched in the plot enriched for "other" or "A"?

  2. I wanted to create the DA plot in R, so I exported the ancombc results using:

qiime tools export \
  --input-path ancombc-body-site-BP-tax-glom.qza \
  --output-path ancombc-body-site-BP-tax-glom

This output gives me 5 csv files (lfc_slice, p_val_slice, q_val_slice, se_slice, and w_slice). When I look at the lfc csv, I see way more taxa with a positive lfc value in the csv file than what was plotted on the DA plot as enriched (qiime composition da-barplot). How does qiime/ancombc choose what taxa to put on the plot? Are the taxa with positive lfc values still "enriched" if they are not in the DA plot?

Sorry if these questions were already anwsered elsewhere. I was not able to find any posts similar to my questions so I decided to post here. Thank you for your time.

1 Like

Hi @Bark9299,

Great questions! Happy to discuss below.

If I'm understanding your goal correctly, it sounds like you'd like to compare the relative abundance of body sites B, C, D, and E with respect to body site A. Instead of creating a new column in your metadata, you can also just set body site A as your reference level when running ancombc, and your results will show the LFC/etc for each body site with respect to A.

With that being said, in terms of interpreting the da-barplot results for your configuration - the enriched/depleted taxa you're seeing would be with respect to A (assuming that's what you set as your reference level initially).

You'll see more enriched/depleted taxa in the ancombc results vs. the da-barplot results because there is a significance threshold parameter in da-barplot which will filter out all taxa that don't meet that threshold. You can change that threshold if you'd like to see more/less taxa in the resultant visualization.

Hope this helps! Cheers :lizard:

1 Like

Hi @lizgehret,

Thank you for your helpful reply! The reason I wanted to make a seperate column with just body site "A" and "other" is because I don't want comparisons of body site A to each body site, I want to compare body site A's relative abundance to all other body sites together to just have one plot and not 4. I did not specify a reference level in the code so because A is alphabetically first from other, it would be with respect to A. Just to clarify, does this mean when I'm looking at the DA plot and body site "A" is alphabetically first, the "enriched" taxa would be enriched in A but not other? Sorry, this part still confuses me :upside_down_face:

1 Like

Hi @Bark9299,

Gotcha! In that case, what you've done makes sense - and you're correct that because A is alphabetically the first group within the body site column, you can just leave it as-is as it will be set as the default reference level. When you're looking at the da-barplot results for this configuration, you'll be seeing taxa that are enriched or depleted in 'other' (relative to their presence in A).

Cheers :lizard:

3 Likes