Hey guys, I am reading about how Pearson’s correlation can produce false negative results on relative abundances here. This document of R package “compositions” describes four different methods for multivariate analysis on compositional data. It may be outside of the scope of this forum but I think the description of data under aplus method matches our data, that’s “the total amount is meaningful” and “the data should be analyzed in relative geometry”.
But, I checked the source code of q2-longitudinal and found that linear-mixed-effects uses Kendall correlation (line 251 in the code. May the qiime2 developers kindly confirm this?), which according to Wikipedia is “a statistic used to measure the ordinal association between two measured quantities”. I’m not sure how Kendall differs from the aplus and actually if the two are comparable (excuse my shallow knowledge of statistics)…I think Kendall is a univariate test while aplus is multivariate?
And then there’s also q2-gneiss which uses balances that can “correlate metadata with all features”…
Any discussion/clarification is appreciated. Thanks.
EDIT: So I read in the gneiss tutorial that "Running these models has multiple advantages over standard univariate regression, as it avoids many of the issues associated with overfitting, and can gain perspective about community-wide perturbations based on environmental parameters. This leads me to another question: Is gneiss and linear mixed effects a type of regression or correlation? For gneiss, I think it’s correlation as suggested by the name correlation-clustering. oh wait no it’s regression as suggested by ols-regression…
Thanks for the in-depth research and suggestions @jjmmii! The short answer is no, QIIME 2 does not have any methods similar to what you are describing. A new plugin or plugin action wrapping the R compositions package would be really useful to the community, and a good way for an experienced R user to contribute code to QIIME 2
I have raised an issue to track that feature request.
That line of code is not part of linear-mixed-effects, which uses LME. That code is for NMIT, which is really using Kendall correlation in an entirely different way (it is correlating subjects' longitudinal compositions to calculate distance, not correlating species abundance with a user-defined variable).
gneiss is a regression method, but maybe @mortonjt would like to elaborate more.
Thanks @Nicholas_Bokulich for your reply!
I further read here that “You can use correlation when the proportion data are from different domains”, so this means if I am correlating abundance of a taxon to an independent variable like blood pressure, Pearson’s correlation is still valid.
The aplus might be useful when I’m correlating abundance of two taxa from the same sample. I’ll try to contribute if I need to use this in the future
Changes in relative abundances is never going to tell you which microbes are actually changes. If you correlate a taxa against blood pressure, and you see that taxa increased, it could very well be that all of the remaining taxa increased. I did go into a few details in my paper here: http://msystems.asm.org/content/2/1/e00162-16
Which leads to your question – yes, gneiss is a regression method. In fact, I think we already have the aplus method you mentioned coded up in skbio as the perburb operation here. However, this functionality is equivalent to addition after performing the ilr, clr or the alr transform. Gneiss at the moment is exclusively using the ilr transform, but that’s going to change in the very near future.
Thanks a lot @mortonjt. If I stick to using Pearson’s for the above case, I can use a “safer” interpretation saying the relative abundance increased when blood pressure increases. As long as I don’t say the taxon’s absolute abundance increased, the interpretation is still valid, right?
Although I don’t fully understand perturb at this moment but will take a look.