Gneiss NaN Plugin Error

Hello,

I have been following along the general workflow in the tutorials
Differential Abundance Analysis with gneiss and Linear mixed effects models on balances in a CF study and performed a gneiss lme-regression on my data, which worked just fine.

However, after changing the method of filtering and performing the gneiss lme-regression on my newly filtered data, I get:

qiime gneiss lme-regression \
 --p-formula "Size+Al+Cr+Cu+Zn+As" \
 --i-table sv_balances_v2.qza \
 --i-tree sv_correlated_hierarchy_v2.nwk.qza \
 --m-metadata-file sv_metals.tsv \
 --p-groups Site \
 --o-visualization sv_lme_Size_Al_Cr_Cu_Zn_As_v2.qzv

Plugin error from gneiss:

  cannot convert float NaN to integer.

I tried to follow your suggestions on the topics Gneiss Plugin Error and Gneiss OLS NaN Error but so far I have not been able to track down, where/ if any NaN or missing values were generated in my data. I inspected my metadata using "qiime metadata tabulate", no missing values there. Also, I tried:

art = qiime2.Artifact.load('sv_balances.qza')
balances = art.view(pd.DataFrame)
np.sum(balances.var(axis=0)==0)
0
balances.shape
(120, 2538)
metadata = pd.read_table('sv_metals.tsv', dtype=object)
metadata.shape
(120, 14)

I am using qiime2-2017.12-py35-linux-conda. Any suggestions would be highly appreciated!

Lena

Hi @VivyanCyril,

Those are the right posts to follow up on to start debugging.

May also be worth mentioning that linear mixed effects is much harder to estimate compared to linear regression. There are scenarios where the formula can actually not be evaluated.

Could you try the following

  1. Rerun that command with the verbose flag.
qiime gneiss lme-regression \
 --p-formula "Size+Al+Cr+Cu+Zn+As" \
 --i-table sv_balances_v2.qza \
 --i-tree sv_correlated_hierarchy_v2.nwk.qza \
 --m-metadata-file sv_metals.tsv \
 --p-groups Site \
 --o-visualization sv_lme_Size_Al_Cr_Cu_Zn_As_v2.qzv
 --verbose
  1. Could you try a formula that is simpler. For instance, could you just try the Size variable. It’ll also help post how many samples there are. In addition, calculating the number of continuous variable + the total number of categories for each categorical variable could also help further insights behind this problem. One possibility is that there is overfitting, and the regression is misbehaving.

Hi Jamie,

Thank you so much for the fast reply!

Running the lme-regression simplified with --p formula “Size” worked, the coefficients summary doesn’t look too promising tho.

Running the lme-regression with the verbose flag gives the following errors - I scanned the whole report and shortened it here, since these lines are repeating:

/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py:2001: ConvergenceWarning: Gradient optimization failed.
warnings.warn(msg, ConvergenceWarning)
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py:2019: ConvergenceWarning: The MLE may be on the boundary of the parameter space.
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/statsmodels/regression/mixed_linear_model.py:2039: ConvergenceWarning: The Hessian matrix at the estimated parameter values is not positive definite.

warnings.warn(msg, ConvergenceWarning)
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/statsmodels/base/model.py:1029: RuntimeWarning: invalid value encountered in sqrt
return np.sqrt(np.diag(self.cov_params()))
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/statsmodels/stats/multitest.py:320: RuntimeWarning: invalid value encountered in less_equal
reject = pvals_sorted <= ecdffactor*alpha
Traceback (most recent call last):
File “/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/q2cli/commands.py”, line 224, in call
results = action(**arguments)
File “”, line 2, in lme_regression
File “/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 228, in bound_callable
output_types, provenance)
File “/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 424, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File “/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/q2_gneiss/regression/_regression.py”, line 73, in lme_regression
lme_summary(output_dir, res, tree)
File “/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/gneiss-0.4.2-py3.5.egg/gneiss/plot/_regression_plot.py”, line 357, in lme_summary
plot_width=900, plot_height=400)
File “/home/qiime/anaconda3/envs/qiime2/lib/python3.5/site-packages/gneiss-0.4.2-py3.5.egg/gneiss/plot/_regression_plot.py”, line 186, in _heatmap_summary
ind = int(np.floor((x - _min) / (_max - _min) * (N - 1)))
ValueError: cannot convert float NaN to integer

Do I interpret correctly, this means, there just isn’t enough variation in the data, apart from the variation caused by my random effects (Site)?

Some more information on my data:
2539 taxa and 120 samples

Fixed effects:
Size - categorical variable with two levels
Al, Cr, Cu, Zn, As - continuous variables, values were clr-transformed

Random effects:
Site - categorical variable with ten levels.

Thank you for your help!
Lena

@VivyanCyril this supports some of my suspicions. We are seeing that the linear mixed effects models is failing for some of the balances. In the OLS command, we have seen this when the variance is close to zero for a balance.

A more sure fire way to debug this sort of problem is to dig into the balance values (as shown in the post here) to see exactly on which value the linear mixed effects model is failing
(here we would use MixedLM instead of OLS.

Forgot to ask in the last question – did you sanity check this against OLS? If it is also failing on OLS, that would narrow down the potential problems.

Hi Jamie,

Thank you for your help. That definitely shed some light on the problem. I will follow your suggestions and look at the balance values in more detail! Looks like there are quite a number of balances with variance close to zero:

art = qiime2.Artifact.load('sv_balances.qza')
balances = art.view(pd.DataFrame)
np.sum(balances.var(axis=0)<=0.1)
311

P.S.: OLS worked fine, yes.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.