Gneiss: Correlation vs. gradient clustering

Hi all,

I am working on Illumina sequence data analysis with gneiss at the moment and was wondering about the differences in the output produced from qiime gneiss correlation-clustering ,which is used in the Moving pictures tutorial, vs. the qiime gneiss gradient-clustering.

In both cases, a hierarchy and balances can be derived, however, what covariant would I choose in the OLS regression? In gradient clustering, do I have to use the parameter used as the m-gradient-category?

I found it also interesting that in the linear regression summary the model mse are identical for both clustering methods, but the raw/predicted data comparison plots do differ (see below).

Output from correlation clustering is on the top, gradient clustering on the bottom.

What happens if I use another covariant than the parameter used as the m-gradient-category to create the gradient-clustering-based hierarchy.qza file?

Thanks so much for any insight here!


1 Like

Right - the gradient-clustering is mainly for interpretability. The actual hierarchy that is being fed into the regression won’t impact the overall fit. So whether you feed in a gradient clustering, phylogenetic tree or even a random tree, you’ll get the same R^2. The main thing that will change are the coefficient heatmap.

The high-level reasoning behind this is that the overall fit is happening in high-dimensional space. The trees themselves are just different rotations of the data, and the actual regression is being performed in the full space.

So the take away is that the choice of tree depends on the question that you are trying to ask. If you are interested in succession of species over a particular gradient, gradient-clustering should do the trick. If you just want to see the largest effects, correlation-clustering will probably be appropriate. If you want look for specific evolutionary divergences, then maybe a phylogenetic tree is appropriate.


Alright, I am getting an idea now. What about the choice of covariants though? The parameter used as the m-gradient-category to create the gradient-clustering-based hierarchy file, do I have to use the same one for the OLS regression (--p-formula)?

Right now the way that gradient-clustering is structured, the clustering can only be done on 1 variable, which is specified by --m-gradient-category. I haven’t figured out a way to do hierarchical clustering on more variables – but it could be interesting!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.