Correlation analysis in QIIME2


In the gneiss tutorial, Ward’s hierarchical clustering was used to identify co-occurring OTUs. Is it possible that we can use this function to search for OUTs that correlate with a specific taxa, say, a pathogenic bacteria, such that we may be able to find candidate probiotics as substitutes for antibiotics?

Additionally, are there any plans to introduce methods, such as SparCC, for the discovery of associations between microbial taxa and metadata? Such as taxa that may correlate with the expression level of certain genes, or, whole-body lipid content.


Hi @yanxianl. That is a good question!

Correct, there are many, many ways to perform clustering. It would be difficult to expose all of these possibilities through the qiime2 interface.

However, in Gneiss, there is a Python API that helps make this a little easier.
Specifically, there is a method called rank_linkage that allows you feed in arbitrary scores for each OTU, which can be used for the clustering. Considering your problem, you could calculate the scores by computing the correlation for the taxa against the pathogenetic microbe that you are interested. But I would still recommend using the proportionality metric

d(x, y) = V [\ln \frac{x}{y}]

Since that is a more stable correlation metric.

I can’t comment on adding in SparCC - I don’t think these methods are able to handle metadata out-of-the-box.