Okay, turns out if I just run one of the covariates that has the wonky p-values, I get a singular matrix error (see attached log file.)qiime2-q2cli-err-qm34eqp6_log.txt (114.2 KB)
The command I ran was: qiime gneiss lme-regression --p-formula "MilkType2" --p-groups "Subject" --i-table FTS_balances.qza --i-tree FTS_hierarchy.qza --m-metadata-file subsetted_meta_for_gneiss2.txt --o-visualization FTS_regression_6.qzv
I suspect this is because I'm looking at different segments of feeding tubes, and each segment from a tube would have the same exposure to feeding type as the other segments. I know I can't use OLS because I have repeated measures from each tube, and I don't necessarily expect a compound symmetric correlation structure.
Also, having updated to QIIME2-2018.6, there still doesn't appear to be any goodness-of-fit statistic associated with the lme output. Do you have any recommendations on which to use, and how to calculate it? I've been reading through this paper, but am not certain if there is a better way.
The QIIME2-2018.6 version of the full (possibly overloaded model) FTS_regression_7.qzv (3.3 MB)
still has the corrected p-value much lower than the original p-value problem. Which is plausible for some variables, e.g. that antibiotics might have a huge impact on the community, but less plausible for other variables. I am continuing to see a large number of NaNs in the p-value file, including for the variables that don't seem to have the corrected p-value drop issue.
In an effort to find a simplified version of the model that would run, I included only two variables (Section and MilkFeeding2). Section did not seem to have the p-value issue, and had the advantage of having enough difference within subject to dodge the singularity issue, while MilkFeeding2 is a variable I would like to understand that is definitely causing problems. When I run this model, the p-value problem in MilkFeeding2 disappears, but the problem seems to remain in the grouping variable. I'm guessing this means that the including all variables is overfitting the model, but I am a bit concerned as the number of balances with NaN for the p-value remains a bit high. All of which is bringing me back to questions on the best way to test the model... FTS_regression_8.qzv (1.8 MB)
Once I decide on how to test the model fit, my plan is to use Section + each covariate separately, then try a larger model including only the variables that most improve fit. Although that still leaves the NaN mystery - which may still be a filtering issue. I'd rather not filter arbitrarily if I can help it, do you have any recommendations on how to decide what is too low abundance to include in a model?
Thanks again for all your help.