ANCOM-BC subsetting input tables and generating the formular

Hi everybody,

I have kind of a general question about which table to use for the ANCOM-BC calculation based on previous calculations.
In my dataset (16s) I have samples from 2 rodent species, from each animal I analysed gut and spleen samples. I want to check if the microbial composition changes based on landuse intensity (low, medium, high).
My first thought was to run the analysis on the whole data set (including both species and both organs) not in qiime but in R, since only there the global test is possible and in my landuseintensity category I have more than 2 groups and don´t want to set one of them as a reference. With the settings: formula = "Species + Organ + Landuseintensity.cat", group = "Landuseintensity.cat", global = TRUE

Then I read this thread which made me think I should probably better use a subsetted table.
I´ve run alpha and beta diversity analysis on my whole dataset, and on subsetted datasets:

  • only species A (but including both organs)
  • only species B (but including both organs)
  • only gut samples (but including both species)
  • only spleen samples (but including both species)
  • species A_gut, speciesA_spleen, speciesB_gut and speciesB_spleen

In all the alpha and beta diversity analysis, is turns out, that microbial compsition in significantely different between organs and species (alpha and beta). In the speciesA_gut, ... datasets, alpha and betadiversity sometimes are significantly different between landuse intensites and sometimes they´re not.
Based on that knowledge I am wondering which subsetted dataset to use, since apparently organ and species has a huge influence on the microbial community and if I understood the above mentioned thread correctely, I would run ANCOMBC 4 times, on datasets species A_gut, speciesA_spleen, speciesB_gut and speciesB_spleen with formula = "Landuseintensity.cat", global = TRUE, to be able to detect the differently abundant taxa in the landuseintensity categories. Is that correct?

Or should I use a "higher" dataset, say the gut dataset, with the formula set to "Species + Landuseintensity.cat", group = "Landuseintensity.cat", global = TRUE? I think I should kind of do the same but since the whole dataset has a different microbial community, it would show different values and different differentely abundant taxa? Is that thought correct?

Which setting and datasets are the right ones to use in order to know how landuseintensity influences the microbial abundance of Species A, Species B in gut and spleen samples?

I´m sorry if these questions are redundant but me being confused by the properties of my dataset and not being familiar with ANCOMBC has me :face_with_spiral_eyes:

Thanks in advance
Best,
Lea

Hi Lea,

If I understand your question correctly, you want to identify which microbes exhibit global differences across various levels of land use intensity while also accounting for differences related to species and organs, correct? In that case, I would recommend using the entire dataset, including both species and organs, as this will allow you to treat "Species" and "Organ" as adjusting covariates. This approach aligns with your observation that "organ and species have a significant influence on the microbial community," and the primary results of ancombc will reflect differential abundance across "Species," "Organ," and "Landuseintensity.cat."

Unless your specific interest lies in identifying differential abundance of taxa across land use intensity within a particular species or organ category, using a subset of the data may reduce your statistical power as the sample size decreases.

Best regards,

Huang

5 Likes

Hi Huang,
thanks for your quick reply and sorry for my late response!

That´s exactely what I want to achieve! I don´t know with which parameter I am supposed to account for the fact that Species and Organ are adjusting covariants, as you said.
Is that the correct way to go?

  • formula = Species + Organ (which I know differ in alpha and beta Div) + Landuseintensity
  • group = Landuseintensity.cat (which I want to detect differences in across all other metadata)
  • global = TRUE

Or do I need to specify another parameter to make sure Species and Organ are read as adjusting covariants?
I know that for example Sex does not have a significant effect on alpha and beta diversity, is there a way to incorporate that into ANCOMBC?

Thanks again,
Lea

Hi Lea,

The formula you've specified is correct. By setting formula = Species + Organ + Landuseintensity.cat, all variables are considered as covariates in the bias correction process, allowing for hypothesis testing for differential abundance with respect to each covariate. Although Landuseintensity.cat is your main variable of interest and both Species and Organ are adjusting covariates, the algorithm treats all variables equally as covariates.

Additionally, if you specify group = Landuseintensity.cat, the algorithm will delve deeper into the specified categorical variable (Landuseintensity.cat), performing a global test if global = TRUE.

In case you're uncertain about the impact of Sex on microbial abundances, you can include it in the formula: formula = Species + Organ + Sex + Landuseintensity.cat. This will help you observe if any taxa exhibit differential abundance concerning Sex. If there are minimal or no differentially abundant taxa, it might be best to exclude Sex from the formula to maintain model parsimony.

Best regards,
Huang

2 Likes

Hi Huang,
thanks a million for the detailed reply it has helped me a lot!

Best,
Lea

1 Like