(I'm extremely new to microbiome analysis so be warned that the following question might be fairly naive.)
I'm currently analysing 20 samples of duodenal microbiota in mice. I performed a gneiss regression analysis using
qiime gneiss ols-regression.
I followed the Qiime2 tutorial on "Differential abundance analysis with gneiss" and read the original publication on the use of balance-trees in microbial niche differenciation, but I still fail to accurately interpret the k-fold cross validation: I know that the
pred_mse must be lower than the
model_mse to suggest that no over-fitting is occuring within the model, but don't know at which point this decrease can be considered significant.
Here my model's cross-validation shows a 3 fold decrease between the
model_mse on average (see screenshot), which looks encouraging to me, but -sadly- "looks" is obviously not enough for a rigorous analysis and I still can't find a tangible way to exclude the hypothesis of over-fitting.
Are there general guidelines to follow when comparing these two values? Any insight / documentation regarding k-fold cross validation would be of great help. Thanks in advance!