ggplot2: Relative Abundances Whisker Plot for Individual Phyla across samples

Hi all,

I am currently using QZA files and R to produce Relative Abundances plots. Following is the code for a sample:

library(ggplot2)
library(reshape2)
library(dplyr)
library(cowplot)
library(grid)
taxa_data <- read.csv("phylum.data.csv")
head(taxa_data)
colnames(taxa_data)
source <- c("Normal", "Normal", "Normal", "LDLrKO", "LDLrKO", "LDLrKO", "LDLr HD 0", "LDLr HD 0", "LDLr HD 0")
if (length(source) != nrow(taxa_data)) {
stop("The length of the source column does not match the number of rows in taxa_data.")
}
taxa_data$source <- source
taxa_long <- melt(taxa_data, id.vars = c("index", "source"))
taxa_long$value <- as.numeric(as.character(taxa_long$value))
taxa_long <- taxa_long %>% filter(!is.na(value))
taxa_long <- taxa_long %>%
group_by(source) %>%
mutate(total = sum(value)) %>%
ungroup() %>%
mutate(percentage = (value / total) * 100)
head(taxa_long)
taxa_long$variable <- gsub(" .*", "", taxa_long$variable)
p <- ggplot(taxa_long, aes(x = source, y = percentage, fill = variable)) +
geom_bar(stat = "identity", show.legend = FALSE) +
theme_minimal() +
labs(x = "Treatment", y = "Relative Abundance (%)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(p)
get_legend <- function(myggplot) {
tmp <- ggplot_gtable(ggplot_build(myggplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
[image]

I want to produce something more like Figure B shown below: [image]

Any suggestions?

Mine:

Want:

Hi @Gordon_Zheng and welcome to the :qiime2_square: Forum!

What do you want exactly from Figure B? The theme, separating the conditions, or something else?

For the theme, you can check all ggplot pre-built themes and see if any of them fit your needs. You can also build one yourself.

For the grouped plots, you can generate one plot per condition and then use patchwork.

Best,

Sergio

I am looking for separate plots, like Figure B and the +-SE. I combined all the samples in my figure, but I am not sure how to separate them based on their phylum and calculate +-SE. I attempted an R-code to calculate standard error but my relative abundance goes up to 3000%

Hi @Gordon_Zheng ,

As far as I understood, this CSV file contains relative abundance of features annotated at the phylum level. Each row is a sample and each column is a phylum. I understand you want a plot for each phylum where the X-axis contains your "source" variable (i.e., "Normal", "LDLrKO" and "LDLr HD 0").

Figure B (as well as your figure) are boxplots, that show the distribution of data. You don't need to calculate SE in order to generate them.

What I would do is:

  1. Convert your source column to factor using as.factor(taxa_data$source)

  2. Then, for each phylum you want to plot:

    2.1. Keep only the column of that phylum and the source column. I see you use tidyverse packages, so some tidy code could be:

    plot_data <- taxa_data %>%
      select(matches("Firmicutes"), source) # IDK exactly the column names, feel free to edit
    

    2.2. Plot it. This is a basic backbone:

    plot_firmicutes <- ggplot(df, aes(x = source, y = .data[[1]])) +  # Again, IDK the exact col name so I assume is the first
                         geom_boxplot() +
                         theme_minimal()
    
  3. Once you generate all the plots you want, use patchwork:

combined_plot <- (plot_firmicutes | plot_actinobacteria) / 
                 (plot_proteobacteria | plot_bacteroidetes)

Sergio

Welcome to the forums!

For separate panels, you can use the ggplot function facet_wrap()

For the stat tests in the graph itself, SE, comparison of groups, etc., I use this package.

Making a complex graph requires more complex code, but it can be done! Let us know if you have questions along the way.

1 Like