Dear Qiimers,
I am moving my first steps with Gneiss, and I am trying to get more familiar with the results. I might need some help with interpreting the output:
-
Looking at the models summary, mse vs pred_mse: as I understand it, we are comparing the mse from the model on the training dataset (random 90% of the data) vs the test dataset (leftover 10% of the data). If the model is overfit, the error in the predictions will be larger than the mse on the model. My question is: how good is good enough? I.e, should the pred_mse be in the order of 1/10th of the other? What if the two values are about the same ? Is there a ratio between the two that can be used as a rule of thumb to judge over/underfitting?
-
Comparison between the two plots: "projected predictions" and "projected residuals". In the first, I can eyeball if the predicted values are a reasonable representation of the real data; in the second, by comparison with the first, I can check if the residuals are in the same order of magnitude of the predictions (= not good; that would mean large random error and scarce predictive value of my model, like it appears on the tutorial dataset, where Rsquared=~0.11). Is my interpretation correct?
Thank you for your kind attention,
Max