I have a question regarding ANCOMBC and visualizing differences in specific taxa over time. In short, I would appreciate any opinions on if I'm going in the right direction or not in terms of presenting my data in a paper.
Samples were taken from steers on day 1. The steers were then given antibiotics for 16 days. Samples were collected on day 7, 14, 15, and 16. I want to determine the effects of the antibiotics on the microbiome over time.
I have run ANCOM-BC with the following command and then created barplots from that using da-barplot command from the composition plugin. I set the reference levels to be relative to day 1 (pre-antibiotics).
From this I have found a couple of taxa that I am interested in that were depleted during days 7 and 14 relative to day 1.
I want to be able to show the magnitude of this change in one graph instead of several da-barplots (which I also have as figures in my paper). To that end, I used the qiime2R and phyloseq packages in R to import my data and create jitter plots of these taxa (see example below). Please ignore the data here, I used a dummy set for visualization purposes.
I first used relative abundance for the y-axis, but thought that it wouldn't make sense with how ANCOMBC is calculated, so i switched to absolute abundances. I would like to be able to say this taxa significantly decreased over time by # from day 1 to day 7. Is there a way to do that? Am I on the right track or way off?
Also, is it appropriate to use ANCOMBC like this in this case? I do have a column in my metadata for pre and post antibiotics that I've also run on ANCOM, but I wanted to see the changes in taxa over each of the days.
You are right, relative abundance is not the best option for Ancombc output. However, absolute counts are not normalised, while Ancombc normalises the data for the analyses. So absolute counts are also misleading. Log fold changes are more reliable for that.
But I understand that you don't want to show series of barplots with all time points versus one control.
In similar situations, I extract LFC and P-adj (or q) values from differentials.qza file (you can extract or export tables), then I apply filtering based on both metrics (by your choice) and create one table with all differentially abundant features (from all treatments / time-points) as rows, and treatments / time-points as columns, and LFC values as values (N/А or nans for missing values), and then plot a heatmap. Negative values for more abundant in the control, positive for treatments / time-points.
In that way I don't need to show several barplots, plus it allows to track the features that demonstrate consistent changes.
I greatly appreciate the detailed reply and guidance!
I can extract the LFC and q-value files from the qza zipped folder separately, but is there a way to combine that information into a big table? Would metadata tabulate do the trick?
Also, just for my own sanity check, if ANCOMBC reports a taxa as enriched is it correct to say that the abundance is higher in that group? So for example, if Campylobacter is enriched in the experimental group relative to the control group, could I report that the abundance of campylobacter is higher in the experimental group? Just making sure I've got the terms correct.