How to select differential abundant ASVs/ Keggs from ancombc2 for enrichment analysis

mu_ab · May 20, 2023, 9:49am

Hello all,
I have been working on my 16S amplcon data for a while now and I have gotten to the last of the downstream analysis where I am stuck and I dont know hwo to move forward. I have data set that I woud say loks like a full factorial; Genotype (4 levels; G1, G2, G3, & G4), Day (3 levels; D1, D2 & D3), Treatment (6 levels; Control, Atrazin, PFOS, Diclo, Arsenic, wastewater) and Replicates (3 biolgical replicates of the genotypes across the time points and treatment).
I have run a differential abundance analysis using the function "ancombc2" that uses the lmerTest in its model. This i think suites my kind of data because it will allow me look for interaction among the variabels and I will also have a nested model with replicates as random effect. Please see below my code:

set.seed(123)
output2 = ancombc2(data = ps, assay_name = "counts", tax_level = "Genus",
                  fix_formula = "Treatment * Genotype * Day", rand_formula = "(1|Replicates)",p_adj_method = "holm", pseudo = 0, pseudo_sens = FALSE,prv_cut = 0.10, lib_cut = 0, s0_perc = 0.05,group = "Treatment", struc_zero = FALSE, neg_lb = FALSE,alpha = 0.05, n_cl = 2, verbose = FALSE,global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = FALSE,iter_control = list(tol = 1e-2, max_iter = 20, verbose = FALSE),
                  em_control = list(tol = 1e-5, max_iter = 100),lme_control = lme4::lmerControl(),
                  mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
                  trend_control = list(contrast = NULL, node = NULL, solver = "ECOS", B = 100))
# ps = phyloseq object

I assume that the pairwise comparison will be agaisnt the base "Treatment", am not too famiiar with the meaning of the ancombc output.
The "output" has several files: global, prim pairs, and Dunn test. I can see in the 'prim' output interactions but most are false in terms of p-val but the 'global' has a different table structure with diff_abun column, W, adj_pval and the taxon. I other to move forward with this analysis, my aim is to identify ASVs,/ kegg genes that are enriched and then visualise this. but at this point I dont know how to selct the diff_adun ASVs to create a list that will be use for enrichement analysis. To clarify, I am using the amcombe package to run differential abundance analysis on both picurst2 kegg output and phyloseq object for ASVs
I would be grateful if anyone could share their thoughts on this. Thank you
res_global2.csv (5.4 KB)

mu_ab · May 22, 2023, 6:42pm

Hello all.
I would highly appreciate it if anyone can offer suggestions on how to go about this.
Thank you

BOYANG_HUANG · June 17, 2024, 8:00am

Have you figured it out? i am having the same question right now

Mike_Stevenson · July 28, 2024, 5:59pm

Hi @mu_ab

In order to do this, you'll need to use your 'global' output file that you mentioned as one of the outputs from your code.

Assuming you already have this loaded into R, the following code(s) should help you extract the DA features and produce a bar plot for your data.

Remember to change the path to your own .csv file (in the 'output' directory you mentioned) in this first step.

Extract Diff. Abun. Features

# Load the global output
global_output <- read.csv("path_to_global_output.csv")

# Filter for differentially abundant features
diff_abun_features <- subset(global_output, diff_abun == 1 & adj_pval < 0.05)

# Extract the list of differentially abundant ASVs/KEGG genes
diff_abun_list <- diff_abun_features$taxon

In the next command, ggplot is used; bar plot is selected here but you can choose other visualisations like heatmaps, if you prefer!

Visualise Diff. Abun.

library(ggplot2)

# Example: Bar plot of differentially abundant features
ggplot(diff_abun_features, aes(x = reorder(taxon, W), y = W)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Differentially Abundant Features", x = "Taxon/KEGG Gene", y = "W Statistic")

In the final stage, use the 'diff_abun_list' you got from the first step, and use it as an input. Make sure you have the 'clusterProfiler' package installed before running this command (should be available via 'BiocManager').

Enrichment Analysis

library(clusterProfiler)


# Example: KEGG enrichment analysis
kegg_enrichment <- enrichKEGG(gene = diff_abun_list, organism = 'hsa', keyType = 'kegg')
dotplot(kegg_enrichment, showCategory = 20)

Hope that helps you get a nice DA output for all of your datasets!