The best statistical test to analyze the relationship between inflammation markers and microbial abundance

Hey all,

I read a forum post where they said that correlation analyses are not the right analysis to measure if there is a relationship between the abundance and other variables (e.g. TNF-alpha). Running a correlation analysis adjusting for multiple comparisons is not a good option either? Which one will be the best approach to relate bacterial abundance and inflammation and glucose markers?

I would like to hear from those who are more familiarize with bioinformatics and biostatistics how would you approach it.

Thank you for the help!!!



Hi @cortegas,

A couple of general ideas. First, there can be some utility in converting your continuous data into something categorical using your favorite data transformation. It becomes several times easier to handle these relationships with standard microbiome testing. A statistician or someone familiar with your field and measurements could probably best help you work with this.

Analysis wise, I always start with diversity analyses because I assume my feature-based (ASV/OTU/genus/gene/etc) analysis will be underpowered.

So, there, I would look at correlation with alpha diversity, or even an alpha diversity regression (this is in q2-longitudinal or your favorite regression package).

In beta diversity, i would tend toward an adonis test for continous data, which addresses the amount of variation explained, a mantel test, which shows a univariate correlation, or the permanova implementation here. These live in the q2-diversity plugin.

Once you’ve established relationships on a whole community level, then you could go to feature-based analysis. There, I might again start with your categorical data and explore some fo the common techniques here. But, you could also try something like Gneiss or Songbird, both of which take continous data.



@jwdebelius thank you for the detailed answer :grinning: . I already run the diversity analyses as well as LEfSE. What I am trying to do now is if we could add an analysis to relate the inflammatory markers and the gut microbiota results. We do not have longitudinal data, unfortunately. We measure IL-6, glucose, and GM after a high-fat diet, without baseline measurements.


Adding new clinical data like that does not/should not change your analysis approach unless there’s something special about that data in particular that makes the analysis challenging. So, for example, untargeted metabolomics or RNAseq might be very different creates that what you’re wanting to do here. But, for 3 to 10 quantitative markers in a cross sectional population, I would look at their relationship with diveristy using a statistical test, and then maybe consider a multivariate feature-based model if you have something that’s significant.


Hi, sounds like you want to assocaite specific gut microbes to host phenotypes. For that purpose, you can use techniques for running multivariate multi-table analyses, such as sparse partial least squares (sPLS) regression and canonical correspondence analysis (CCpnA). Check out this great paper by Ingham et al., which conducted a very similar analysis as you described. The paper also published the codes as RMarkdown files.

Alternatively, you use a tool specifially developed for this purpose called MaAsLin2.


Thanks. You suggest us to look for the relationship between diversity and biomarkers and if this analysis is significant to look further into a multivariate feature-based model.

1 Like

@yanxianl Thanks for sharing! I will take a look and try the different options that you guys suggested.

1 Like

Another method you could try is random forest regression with the q2-sample-classifier plugin.

This may not be what you are looking for — it will not provide P values for individual features.

Instead, it will tell you:
(a) how accurately you can predict some continuous value (or categorical state in the case of classifiers) as a function of microbiome composition.
(b) what features best explain those predictions, i.e., which are most closely correlated with that value.

You can use this to, e.g., predict the abundance of your inflammation markers (and it may help to log-transform those target values before prediction). It does not assume a linear relationship between the predictors and targets, so can be quite powerful for pulling out associations in complex microbial datasets.