Corrected p-values using gneiss lme-regression smaller than uncorrected p-values


I am using a slightly outdated version of QIIME2, 2018.4, but was hoping to finish the analysis in the same version of QIIME2 I began the analysis in, if at all possible.

I am analyzing data that collected multiple samples of different types from a group of infants, and I want to compare the abundance of the taxa from the different types, as while test for the influence of a few other variables such as gestational age at delivery. I am running the regression with lme-regression and setting subject ID to the grouping variable to account for repeated measures. The model runs fine, but I was a bit surprised by some of the results, namely the corrected p-value is frequently smaller than the uncorrected p-value. E.g. for the influence of antibiotic exposure on one balance, I have an uncorrected p-value of 0.889, and a corrected p-value of 0. I find this a little difficult to believe, but I am uncertain what might have caused it.

I know I included a few too many variables in my model for the sample size, but I’d planned to use a backwards elimination approach to the analysis. Unfortunately, with the corrected p-values everything appears significant. Is this an issue of an overloaded model? And does a later version of QIIME2 provided any goodness of fit testing for regression models? Or is there a chance that QIIME2-2018.4 is reporting corrected p-values that should be 1 as 0?

I’m just trying to sort out what might have gone wrong.


Hello Diana,

This is a great question. Yes, this would surprise me too!

To help the Qiime devs figure out if this is problem, would you be willing to post the full command you ran? Ideally, it would be great to see the full .qzv of your results, but I understand if this included private patient information.



Sure, I’m running a version now that should remove anything identifiable. If that presents the same problem, I will post the .qzv file here.

The command I ran was: qiime gneiss lme-regression --p-formula “MilkType2+FeedingTubeType+GestationalAge+AcidSupressors+Sepsis+NEC+IntestinalPerf+AntibioticsAtBirth+TubeDuration” --p-groups “Subject” --i-table FTS_balances.qza --i-tree FTS_hierarchy.qza --m-metadata-file subsetted_meta_for_gneiss2.txt --o-visualization FTS_regression_1.qzv

Got this: Saved Visualization to: FTS_regression_1.qzv
and the FTS_regression_1.qzv file as output. I also tried a version with one less predictor variable, and saw the same corrected p-value of 0 problem. In the meantime, I will see about updating to the newest QIIME2 in case the issue was already resolved.


Hi @willowblade, yes that is very weird. Could you post your qzv / qza files so that we can sanity check?


I’ve just gotten a .qzv file run without any possible PIH. It’s not as bad as the version with a few more variables included, but I do still see the problem when looking at milk type. I will also attach the .qza files used for last command (balances and hierarchy). Do you need more than just those?

1 Like

FTS_regression_3.qzv (2.5 MB)
FTS_hierarchy.qza (53.7 KB)
FTS_balances.qza (783.1 KB)

1 Like

Hi @willowblade

A couple of things

  1. The corrected p-values download was enabled in a recent release, so you should be able to download those with the newest release (it looks like you are using 2018.4 which is a couple of versions out of date).

  2. I'm noticing quite a few NaNs when I open up the pvalues


Not entirely sure what is causing this, but it maybe worthwhile to revisit the filtering criteria to remove low abundant species. May also want to experiment simplifying the model (for the sake of debugging).

1 Like

just to chime in here: sounds like this might not be unusual behavior.

1 Like

Thanks for the responses, my filtering criteria were as follows:

  1. No samples with missing metadata
  2. RSVs present in less than 5 samples were excluded
  3. RSVs with less than 10 total reads were excluded

I thought this would be enough to take care of the low abundance issue - what would you recommend for more stringent filtering of low abundance features?

I still think this is likely to be unusual behavior, as I would not anticipate the proportion of null hypotheses to be small, I am more surprised to see any differences by Tube Type than I would be to see no differences at all.

Thanks again


Will be a little more information about your data.

Have you looked at single covariates in your model? And do you see the same behavior for simpler models?
We have also observed bans to pop up when there is low variance, which is a one possible characteristic of low abundance microbes.

1 Like

I’ve tried models with just two covariates, with the same behaviour. I will run single covariates next. I’ve just had the update to qiime2-2018.6 come through, so I am testing to see if I have the same behaviour in the newer version.

It’ll be slow, but I will keep you posted.

Thank you for all your help

1 Like

Okay, turns out if I just run one of the covariates that has the wonky p-values, I get a singular matrix error (see attached log file.)qiime2-q2cli-err-qm34eqp6_log.txt (114.2 KB)

The command I ran was: qiime gneiss lme-regression --p-formula "MilkType2" --p-groups "Subject" --i-table FTS_balances.qza --i-tree FTS_hierarchy.qza --m-metadata-file subsetted_meta_for_gneiss2.txt --o-visualization FTS_regression_6.qzv

I suspect this is because I'm looking at different segments of feeding tubes, and each segment from a tube would have the same exposure to feeding type as the other segments. I know I can't use OLS because I have repeated measures from each tube, and I don't necessarily expect a compound symmetric correlation structure.

Also, having updated to QIIME2-2018.6, there still doesn't appear to be any goodness-of-fit statistic associated with the lme output. Do you have any recommendations on which to use, and how to calculate it? I've been reading through this paper, but am not certain if there is a better way.

The QIIME2-2018.6 version of the full (possibly overloaded model) FTS_regression_7.qzv (3.3 MB)
still has the corrected p-value much lower than the original p-value problem. Which is plausible for some variables, e.g. that antibiotics might have a huge impact on the community, but less plausible for other variables. I am continuing to see a large number of NaNs in the p-value file, including for the variables that don't seem to have the corrected p-value drop issue.

In an effort to find a simplified version of the model that would run, I included only two variables (Section and MilkFeeding2). Section did not seem to have the p-value issue, and had the advantage of having enough difference within subject to dodge the singularity issue, while MilkFeeding2 is a variable I would like to understand that is definitely causing problems. When I run this model, the p-value problem in MilkFeeding2 disappears, but the problem seems to remain in the grouping variable. I'm guessing this means that the including all variables is overfitting the model, but I am a bit concerned as the number of balances with NaN for the p-value remains a bit high. All of which is bringing me back to questions on the best way to test the model... FTS_regression_8.qzv (1.8 MB)

Once I decide on how to test the model fit, my plan is to use Section + each covariate separately, then try a larger model including only the variables that most improve fit. Although that still leaves the NaN mystery - which may still be a filtering issue. I'd rather not filter arbitrarily if I can help it, do you have any recommendations on how to decide what is too low abundance to include in a model?

Thanks again for all your help.

1 Like

Yikes! Another post was brought up here: Gneiss Singular matrix error - #5 by ajaybabu27

This is most surely pointing to some sort of ill-defined input. Does this throw an error on just one of the inputs? And have you tried running OLS on it as a sanity check. I know it is not "technically" correct, but if it also fails on OLS, then that is a sign for a more sinister problem.

Nice article! Right, an interpretable goodness-of-fit is not too straightforward. Another nice blog post breaks this problem down here Gneiss does provide the residuals and fits, so it should be possible to rig your own goodness-of-fit. But this is definitely something that we really should have in Gneiss in the future.

Right, this is tricky and unfortunately there isn't really a right answer that I'm aware of at the moment. If I had to guess where the source of NaNs is coming from, it is likely because there aren't enough samples within the subgroups -- you need at least 3 samples for each cross-section. That means for a given milk type, and a given section for a given individual, you need at least 3 values for your particular balance have non-zero variance. If you don't have many samples, this actually can be quite a stringent criteria.

The rule of thumb I use when it comes down to filtering species is measuring your degrees of freedom. If you have 2 categorical variables you have 2 degrees of freedom. If you have a bunch of microbes that are only observed in 2 samples, you can have a near perfect fit for those microbes, so whatever inference you perform on them is not useful (because you model will not have the resolution to measure them). At a bare minimum, I would recommend counting all of the variables in your formula, and using that as the baseline for filtering. Although be careful, categorical variables with D categories actually have D-1 degrees of freedom and therefore count as D-1 variables.


Thanks! To be thorough for anyone reading this later, I did run the MilkType2 only OLS model, and that ran fine. See here: FTS_gneiss_OLS_milk_check.qzv (1.3 MB)

If I manage to write some R code to go from QIIME2 output to goodness-of-fit value using the method in the paper mentioned above, would that be worth posting here for others to use?

I have 456 samples in total, so well over 3 per tube section per feeding type. Or do you mean three samples per subgroup with all taxa in a given balance present?


Most definitely -- it also make it easier to tack on in gneiss (its currently not clear to me what the best measure of goodness-of-fit is either).

The latter, for a given balance, that balance needs to have nonzero variance for each subgroup -- if you can't calculate variance because all of the values are the same, you can't calculate proper test statistics for that balance for that covariate.

1 Like