I'm struggling to understand the balance summary bar plots when using more than two groups. Our study looks at drug treatment at different time points giving us 5 groups (D0, D15_control, D15_treatment, D36_control, and D36_treatment). I (think I) understand that when comparing two groups the group with the higher log ratio is depleted in taxa that are contained in the numerator compared to the other group. I'm not sure what the comparison group is when you have multiple groups, or if I can even compare multiple groups with this method.

I'm experimenting with a balance that doesn't have a lot of taxa (7 numerator/3 denominator). Below is the balance bar graph I get when I do a balance summary. "D0" appears to be at a log ratio of 0, presumably meaning that there is no difference between the taxa in this group (log(1)=0), but compared to which other group?

Also, when I change my metadata category I will get different box plots (like above) but the balance taxonomy bars (below) don't change. Why would this be?

That looks interesting. Yes you can compare multiple groups with this method, since the linear regression boils down to ANOVA for categorical variables.

As noted at the end of the gneiss tutorial, there are a few possibilities that could explain the shift in the log ratios - notability the numerator could be increased in D36_Cntrl, D36_Risp compared to D15_Cntrl, or the denominator could be decreased in D35_Cntrl, D36_Risp compared to D15_Cntrl.

Concerning D0 and D15_Risp, it would appear that these taxa are not commonly observed in these samples (since like you noted above, log(1)=0). Of course, the raw counts of these taxa within D15_Risp and D0 will need to be sanity checked.

Concerning the taxonomy summaries – those don’t change, because you have already define the balances before running the regression (I’m assuming that you either ran correlation-clustering or gradient-clustering).
Once the balances are constructed, the taxonomies for each balance are set in stone – if you have defined 2 Firmicutes in the numerator and 5 Bacteroides in the denominator, then that log ratio is fixed.
The boxplots may change, when you specify different categories, since you will specify different groupings – you can think of it like coloring different categories in a PCoA plot in Emperor.

I have two clarification questions (sorry if they’re redundant or silly, balances are a new concept to me).

Do you always used the smallest log ratio as the comparison group? In my case you compared to D15_cntrl, which has the smallest log ratio. Is that the standard? For example, if D36_cntrl had a negative log ratio would that be the group I compare everything to instead of D15_Cntrl?

In the tutorial you say:

This seems to be opposite of what you told me about my D15_cntrl, D36_cntrl/D36_risp samples; that the higher log ratio implies more abundant taxa in the numerator than the denominator for the D36 samples. The tutorial says the opposite (that the lower log ratio are more abundant than the higher), is this a typo in the tutorial or am I missing a major concept?

I think that depends on the question that you want to answer. Personally, I find that focusing on the the groups with the smallest log ratios or the largest log ratios can ease interpretation. But that depends on the context of the experiment. From what I can tell in your example, the balance really describes the difference betwen D15_Cntrl vs D36_Risp and D36_Cntrl

Yeah … that is a typo in the tutorial. The control group in that example is larger, so y0_{numerator} > y0_{denominator} for the controls, but y0_{numerator} < y0_{denominator} for the patient group. Thanks for catching that!!

Thanks for your help interpreting the balances graphs, I’m now trying to assign significance to the groups. I’m unsure how to do this, but I remember when I ran the ols regression there was a summary file with the heatmap that gives a p-value for each coefficient for each balance. I understand how to interpret normal linear regressions and I did read the article linked in the tutorial (ANOVA to regression) but I’m unsure how to apply that to multiple variables with multiple levels each. Can those p-values in the regression summary be associated with the output graph of the balance_taxonomy command (if so how?)? Or should I look at the balances with the longest branches and go back to the abundance tables and calculate p-values for those taxa myself (using Kruskal-Wallis for example)?

I don’t know if it’s helpful but my formual used in the ols equation was:
“Treatment+Day+Day_treat”

I would attach my ols-summary-file but it’s too large for the forum (3.2MB).

Could you clarify what you mean by multiple levels? Do you mean that you have multiple levels within a variable (i.e. 1 ml of antibiotics, 2 gram , …)? Or are you trying to visualize the effects of a treatment within a particular day?

For time series, I would try to take a look at qiime gneiss lme-regression. A tutorial for this can be found here. Considering how to visualize this, there is currently no out-of-the-box way to visualize within-treatment effects (i.e. antibiotics over time).

Note that you can identify meaningful balances from the lme-regression or ols-regression. And the balances can be directly unpacked from the FeatureTable[Balance] object outputted from qiime gneiss ilr-transform (see qiime tools export). So you can plug in these balances into your favorite plotting library.

If there is growing demand for this functionality, we can raise an issue on q2-gneiss to enable more comprehensive plotting capabilities.

Yes, I mean that I have multiple levels within a variable, I have treatment (drug 1 vs drug 2) and Day (day 0 vs day 14). I would like to visualize the effects of the different treatments across the different time points.

Thanks, I will try the lme-regression, as treatment over time is a question we are curious about.

I saw that I could look at the balance-taxonomy, but I will definitely look into pulling out the best balances and plotting them remotely.