Hello QIIME2 community,
I have microbiome data from a diet intervention study using a 3x3-factorial design - fat, grain, and starch. I'm using gneiss to associate groups of ASVs with host metadata using qiime 2018.4. I used correlation clustering to group ASVs and performed ols-regression with the formula "Cecal_pH+Abdominal_fat+Acetate+Butyrate+Propionate". I have some dataset-specific questions I'd like feedback on before I explore these results in-depth.
-
My understanding is that gneiss performs its own under-the-hood normalization of the feature counts. I therefore added pseudocounts and performed correlation clustering with the unrarefied ASV table as done here: QIIME2 Workflow. Is this thinking correct, or should I repeat the analysis with rarefied data?
-
I expect the measurements in my model to show at least some collinearity (ex: a drop in Cecal_pH is probably driven by increased acetate, propionate, and/or butyrate in this dataset). Will this affect how I build my regression formula?
-
In the attached regression summary, raw values in the "projected prediction" plot form three distinct clusters which most of the predicted values fall within. However, a minority of them fall outside the bottom-right cluster. The distribution of raw values looks strange compared to what I've seen elsewhere, and the presence of predicted values outside the "range" of raw values is troubling me a bit. Does this mean that the model has poor predictive power? Can I 'trust' any results I get?
regression_summary.qzv (7.4 MB) -
I explored the relationship between balance 'y4' and butyrate. In the log-ratio:butyrate plot produced by balance-taxonomy, I expected the relationship to be linear but it actually looks logarithmic. Is this a 'reasonable' result, or does it indicate some underlying issues with the model?
y4_taxa_summary_butyrate_L4.qzv (126.4 KB)
Thanks for any help you can offer! If anything needs clarification please let me know!