I want to associate bacterial abundance with continuous clinical metadata. I’m thinking about the different options and their pros and cons and hope this wonderful community will help me with the decision.
The first issue is whether to use relative of absolute counts- I think that using relative abundance should fit more to this task than absolute counts, which deals with the different samples coverage.
Next issue is how to do it-
The simplest option is to correlate each vector of abundance to each vector of metadata. It seems that Kendall correlation is slightly better than Spearman since its accounting for ties (and we will have many ties, specially of zeros, in such data). As always simplest option has its advantages and disadvantages…
Another option is LMM, I thought to do it with maaslin2, which allows to deal with several effects including random effects such batch effects. The main drawbacks that I find is that its not sensitive to cases of sparse data with many zeros and that it is not easy to plot (which is a secondary consideration).
The last issue is whether to associate ASV or collapsed data to the species level for example. On one hand for species level the data is less sparse, but I feel that the species assignment is too noisy for 16s…
Any thoughts? Other suggestion for methods/tools for such task?