Association of microbial abundance with continuous metadata

yipinto · March 11, 2020, 9:43pm

Hey,
I want to associate bacterial abundance with continuous clinical metadata. I’m thinking about the different options and their pros and cons and hope this wonderful community will help me with the decision.

The first issue is whether to use relative of absolute counts- I think that using relative abundance should fit more to this task than absolute counts, which deals with the different samples coverage.

Next issue is how to do it-
The simplest option is to correlate each vector of abundance to each vector of metadata. It seems that Kendall correlation is slightly better than Spearman since its accounting for ties (and we will have many ties, specially of zeros, in such data). As always simplest option has its advantages and disadvantages…

Another option is LMM, I thought to do it with maaslin2, which allows to deal with several effects including random effects such batch effects. The main drawbacks that I find is that its not sensitive to cases of sparse data with many zeros and that it is not easy to plot (which is a secondary consideration).

The last issue is whether to associate ASV or collapsed data to the species level for example. On one hand for species level the data is less sparse, but I feel that the species assignment is too noisy for 16s…

Any thoughts? Other suggestion for methods/tools for such task?

Thanks!

jwdebelius · March 12, 2020, 1:49pm

Hi @yipinto,

I think in this case I would approach it a couple of ways. First option is to break it into groups and see if you see an association that way using something like a permanova. its suboptimal, but it's a starting place. Adonis can also handle continuous data if you want to test beta diversity.

I think newer tools (Songbird, Aldex2, ANCOM2... which is in R but not ) will let you do a multivariate regression analysis which woudl let you work with continuous data as well. So, that might be an option.

Best,
Justine

yipinto · March 12, 2020, 8:26pm

Thanks Justine,

What do you think is the main drawback of simple correlation test like Kendall?

So in case I want to keep it continuous you suggest either adonis (but then I’ll miss the association per feature) or use one of songbird aldex2 or ancom2. Specifically, for aldex2 the q2-aldex2 works only for categorical 2 groups right?

jwdebelius · March 12, 2020, 8:59pm

Hi @yipinto,

If you haven't read it, I highly recommend Microbiome Datasets are Compositional and this is not optional. If I were teaching a formal course on microbiome stats, this would be required reading. So, the main drawback of a kendal correlation for features is that it's not's compsoitionally aware... and I'll let the article unpack that for you.

I like to complement things: adonis tells you if there's a community level shift, and then an individual feature-level analysis tells you what's changing. So, I think a complementary approach is needed: adonis and a continous test. I'm not a big AldEx2 user, and so you may need to double check the information. Someone suggested earlier that it could do multivariate modeling. I know for sure ANCOM2 and Songbird can use a linear regression (so continuous data).

Best,
Justine