Linear Regression Summary in Gneiss

yitkheng · October 30, 2019, 8:54am

Hi Qiime2 folks,

I am super new with Qiime2 microbiome bioinformatics platform and have been introduced to qiime2 a few months back by one of my friends. I do love qiime2 a lot and thank you for developing the platform. Do forgive me if some of the questions are too easy for experts in qiime2.
I have a few quick questions related to gneiss:

a) For the current study, we have a total of 20 different soil parameters (chemical and physical analyses) and with the total number of observations: 50. Would like to know is that possible for me to perform correlation-clustering and ols-regression to find out the soil parameters (or covariates) with the highest R2 diff (as well as low corrected p-values: <0.05) prior to adopting gradient-clustering for those covariates (for instance, pH) that are contributing to the variation in soil microbiomes?

b) In one of the trial runs using gneiss, all the soil parameters (both chemical and physical analyses) were used to compute linear regression summary (through ols-regression after correlation-clustering), Rsquared of more than 0.600 was achieved. Unfortunately, all the pred_mse values (from fold 0 to 9) were higher than model_mse. I then tried to run again with only soil chemical parameters (16 covariates and 50 observations). pred-mse values (ranging from 7.5 to 22) ranging from are now lower than model-mse (ranging from 22 to 26). I am still thinking whether to further reduce the number of soil chemical parameters (or covariates) and re-run with ols-regression. Would like to know is there any minimum ratio (pred-mse to model_mse) required or we should achieved prior to balance-taxonomy analyses? Any maximum number of covariates allowable for ols-regression?

Thank you in advance.

Looking forward to learning more.

Cheers