Interpreting the results of Gneiss ols-regression analysis

Hi,

(I'm extremely new to microbiome analysis so be warned that the following question might be fairly naive.)

I'm currently analysing 20 samples of duodenal microbiota in mice. I performed a gneiss regression analysis using qiime gneiss ols-regression.

I followed the Qiime2 tutorial on "Differential abundance analysis with gneiss" and read the original publication on the use of balance-trees in microbial niche differenciation, but I still fail to accurately interpret the k-fold cross validation: I know that the pred_mse must be lower than the model_mse to suggest that no over-fitting is occuring within the model, but don't know at which point this decrease can be considered significant.

Here my model's cross-validation shows a 3 fold decrease between the pred_mse and model_mse on average (see screenshot), which looks encouraging to me, but -sadly- "looks" is obviously not enough for a rigorous analysis and I still can't find a tangible way to exclude the hypothesis of over-fitting.

Are there general guidelines to follow when comparing these two values? Any insight / documentation regarding k-fold cross validation would be of great help. Thanks in advance!

Hi @MLefeuvre, it doesn’t look like there like there is overfitting given that the predicted MSE is consistently smaller than the model MSE from the training dataset. Not sure how to turn this into a hypothesis test, but is definitely worth thinking about. The predicted proportions are given in the summary, so this is available to analyze using R/Python.