I have one question regarding the relative frequency plot. I am trying to perform a stacked plot for my microbiome data, but in y axis (relative frequencies) I get much more than 100 and I don't know why it goes way higher than 100 in the plot. I have read other posts but I couldn't figured it out. I really appreciate if you can help me out, please.
Hi!
I am not very good in R, so can not comment on your code, but what I noticed is that your dataset looks strange for relative abundances as %. For example, summ of all features in a sample is not equal to 100%, but summ of all samples per feature is equal. I think you made an error when converted absolute abundances to relative abundances. Summ of all features in a sample should be equal 100, not the summ of all samples per feature.
You may want to check your data and see if it is correct. The data doesn't look like count data: all numbers have seven digits and they're smaller than ten. If the data has been subjected to total sum scaling (tss), then the row sums should be 1 or 100% but they are not.
Anyway, assuming numbers in the csv file are sequence counts, we can plot the data by running the following code:
# import data
pc <- read.csv("L2_16S_R2.csv", header = TRUE)
# tidy data
pcm <- pc %>%
# add unique sample ids so that we can normalize data later
mutate(sample_id = 1:nrow(pc), .after = Vineyard) %>%
melt(id = c("Vineyard", "sample_id")) %>%
group_by(sample_id) %>%
# normalize data: convert counts to percentage (total sum scaling, tss)
mutate(value_tss = 100 * value/sum(value)) %>%
ungroup() %>%
# the following 2 lines of code calculate the mean relative abundance of each phylum
group_by(Vineyard, variable) %>%
summarize(value_tss_mean = mean(value_tss)) %>%
# I'm not familar with stringr, so I use gsub for the string manipulation
mutate(
variable = paste0("*", variable, "*"),
variable = gsub("\\*Bacteria_unclassified\\*", "Unclassifed \\*Bacteria\\*", variable)
)
# make plot
ggplot(pcm, aes(x = Vineyard, y = value_tss_mean, fill = variable)) +
geom_col() +
labs(
x = NULL,
y = "Mean relative abundance (%)",
fill = "Phylum"
) +
theme_classic() +
theme(
legend.text = element_markdown()
)