Unexpected Projected Prediction and Log Ratio Plots from Gneiss

Zachary_Bendiks · December 20, 2019, 12:03am

Hello QIIME2 community,

I have microbiome data from a diet intervention study using a 3x3-factorial design - fat, grain, and starch. I'm using gneiss to associate groups of ASVs with host metadata using qiime 2018.4. I used correlation clustering to group ASVs and performed ols-regression with the formula "Cecal_pH+Abdominal_fat+Acetate+Butyrate+Propionate". I have some dataset-specific questions I'd like feedback on before I explore these results in-depth.

My understanding is that gneiss performs its own under-the-hood normalization of the feature counts. I therefore added pseudocounts and performed correlation clustering with the unrarefied ASV table as done here: QIIME2 Workflow. Is this thinking correct, or should I repeat the analysis with rarefied data?
I expect the measurements in my model to show at least some collinearity (ex: a drop in Cecal_pH is probably driven by increased acetate, propionate, and/or butyrate in this dataset). Will this affect how I build my regression formula?
In the attached regression summary, raw values in the "projected prediction" plot form three distinct clusters which most of the predicted values fall within. However, a minority of them fall outside the bottom-right cluster. The distribution of raw values looks strange compared to what I've seen elsewhere, and the presence of predicted values outside the "range" of raw values is troubling me a bit. Does this mean that the model has poor predictive power? Can I 'trust' any results I get?
regression_summary.qzv (7.4 MB)
I explored the relationship between balance 'y4' and butyrate. In the log-ratio:butyrate plot produced by balance-taxonomy, I expected the relationship to be linear but it actually looks logarithmic. Is this a 'reasonable' result, or does it indicate some underlying issues with the model?
y4_taxa_summary_butyrate_L4.qzv (126.4 KB)

Thanks for any help you can offer! If anything needs clarification please let me know!

mortonjt · December 20, 2019, 8:34pm

Hi @Zachary_Bendiks note that we are currently deprecating these functionality in gneiss and recommend to instead look at songbird and aldex2

There have been a few other related posts on this, so feel free to read those

Regarding your questions

Yes, log ratios largely negate the need for rarefaction - but the issue with zeros remains. This is better handled in songbird and aldex2 due to their Multinomial model.
Definitely, collinearity is still a problem in differential abundance. The standard statistical recommendations still apply here - I'd start with fitting multiple models and comparing their differences.
We're deprecating this functionality and recommend songbird/aldex2 instead. So no, please don't trust those results.
Nature is often non-linear, so maybe this isn't completely unexpected. Linear methods can help with insights on this, but certainly won't be the best model for this.

system · January 21, 2020, 2:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.