Unexpected Projected Prediction and Log Ratio Plots from Gneiss

Hello QIIME2 community,

I have microbiome data from a diet intervention study using a 3x3-factorial design - fat, grain, and starch. I'm using gneiss to associate groups of ASVs with host metadata using qiime 2018.4. I used correlation clustering to group ASVs and performed ols-regression with the formula "Cecal_pH+Abdominal_fat+Acetate+Butyrate+Propionate". I have some dataset-specific questions I'd like feedback on before I explore these results in-depth.

  1. My understanding is that gneiss performs its own under-the-hood normalization of the feature counts. I therefore added pseudocounts and performed correlation clustering with the unrarefied ASV table as done here: QIIME2 Workflow. Is this thinking correct, or should I repeat the analysis with rarefied data?

  2. I expect the measurements in my model to show at least some collinearity (ex: a drop in Cecal_pH is probably driven by increased acetate, propionate, and/or butyrate in this dataset). Will this affect how I build my regression formula?

  3. In the attached regression summary, raw values in the "projected prediction" plot form three distinct clusters which most of the predicted values fall within. However, a minority of them fall outside the bottom-right cluster. The distribution of raw values looks strange compared to what I've seen elsewhere, and the presence of predicted values outside the "range" of raw values is troubling me a bit. Does this mean that the model has poor predictive power? Can I 'trust' any results I get?
    regression_summary.qzv (7.4 MB)

  4. I explored the relationship between balance 'y4' and butyrate. In the log-ratio:butyrate plot produced by balance-taxonomy, I expected the relationship to be linear but it actually looks logarithmic. Is this a 'reasonable' result, or does it indicate some underlying issues with the model?
    y4_taxa_summary_butyrate_L4.qzv (126.4 KB)

Thanks for any help you can offer! If anything needs clarification please let me know!

1 Like

Hi @Zachary_Bendiks note that we are currently deprecating these functionality in gneiss and recommend to instead look at songbird and aldex2

There have been a few other related posts on this, so feel free to read those

Regarding your questions

  1. Yes, log ratios largely negate the need for rarefaction - but the issue with zeros remains. This is better handled in songbird and aldex2 due to their Multinomial model.
  2. Definitely, collinearity is still a problem in differential abundance. The standard statistical recommendations still apply here - I’d start with fitting multiple models and comparing their differences.
  3. We’re deprecating this functionality and recommend songbird/aldex2 instead. So no, please don’t trust those results.
  4. Nature is often non-linear, so maybe this isn’t completely unexpected. Linear methods can help with insights on this, but certainly won’t be the best model for this.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.