Gneiss Plugin Error

Hello,

I am attempting to follow along with the 'Differential Abundance Analysis with gneiss" tutorial and adapting for interpretation of my own data set. I am able to execute the steps to calculate balances, but when I reach the regression portion I input the following commands:

qiime gneiss ols-regression \
    --p-formula "Fraction+Growth_Phase+Reactor+Growth_Phase_Verb" \
    --i-table balances.qza \
    --i-tree hierarchy.qza \
    --m-metadata-file Algae_metadata_mapping_file_09_15_17_SS.txt \
    --o-visualization regression_summary.qzv

Each time I’ve run this step I receive the following error:

Plugin error from gneiss:

  cannot convert float NaN to integer

Debug info has been saved to /tmp/qiime2-q2cli-err-mmy0s8cz.log.

Has anyone else encountered this issue? I am not sure where the NaN float values are being generated, so any advise on where to look would be greatly appreciated!

Seth

1 Like

Hi @Seth - we need a few more pieces of information in order to help you!

  1. What version of QIIME 2 are you using?
  2. Can you please provide the complete error log? Either re-running with --verbose, or attaching the log file saved (e.g. /tmp/qiime2-q2cli-err-mmy0s8cz.log)
  3. If possible, can you please provide your sample metadata file? This could be directly messaged to me, if you didn’t want to share publicly on the forum.

Thanks!

Hi Mathew,

Thanks so much for the amazingly quick reply.

  1. QIIME 2017.7.0
  2. The result of running with the verbose flag is as follows:
/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/regression/_ols.py:193: RuntimeWarning: invalid value encountered in true_divide
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)
/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/regression/_ols.py:192: RuntimeWarning: invalid value encountered in sqrt
/opt/conda/lib/python3.5/site-packages/statsmodels/stats/multitest.py:320: RuntimeWarning: invalid value encountered in less_equal
  reject = pvals_sorted <= ecdffactor*alpha
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-151>", line 2, in ols_regression
  File "/opt/conda/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/opt/conda/lib/python3.5/site-packages/qiime2/sdk/action.py", line 392, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/opt/conda/lib/python3.5/site-packages/q2_gneiss/regression/_regression.py", line 27, in ols_regression
    ols_summary(output_dir, res, tree)
  File "/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_regression_plot.py", line 291, in ols_summary
    hm_p = _heatmap_summary(model.pvalues.T, model.coefficients().T)
  File "/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_regression_plot.py", line 186, in _heatmap_summary
    ind = int(np.floor((x - _min) / (_max - _min) * (N - 1)))
ValueError: cannot convert float NaN to integer
  1. I am sending a copy of the metadata file to you directly.

I guess an invalid value is being passed from one function to the next, but not too sure where to track it down.

Thanks!

Hi @Seth!

Thanks for providing your metadata via a DM! I suspect this failure is related to missing values in the following columns:

Fraction, Reactor, and Growth_Phase_Verb (3 of the 4 columns you specified in --p-formula.

One trick for investigating your metadata is to use the metadata tabulate command, then you get a simple visualization of your metadata, which supports basic filtering and sorting (I used this to sort your columns to see if there were any blanks). Check out the Metadata Tutorial when you get a chance, it is a thrilling read! :open_book:

Moving forward, you could either add some kind of BLANK sentinel (whatever value makes sense to you), or maybe it makes sense to remove those samples from your analysis? You will need to investigate to do what is “right” for your study.

@mortonjt, do you have anything to say about this? Thanks!

:t_rex:

Right @thermokarst. We have seen that if you have missing data in your metadata, that can be problematic when running these regression models.

For debugging purposes, can you build a model if you exclude these specific columns?

1 Like

Hi Matthew and Jamie,

It’s nice to hear from you both (particularly after reading many of your posts). I tried again to troubleshoot using your suggestions.

  • I was surprised to see those empty cells when I ran the tabulate command. When I look at the same tab delimited file in excel or notepad++ I see “NA” in these fields. Although I didn’t find it in the documentation, I thought maybe NA is a problem, and sure enough replacing these with ‘Blank’ does fill the cells in the tabulate output.

  • After fixing the metadata file I re-ran the model build step with columns that seem to all be filled in the tabulate visualization. Unfortunately I still get a similar error as shown below.

Code:

qiime gneiss ols-regression \
    --p-formula "Fraction+OD750+Reactor+Growth_Phase_Verb" \
    --i-table balances.qza \
    --i-tree hierarchy.qza \
    --m-metadata-file Algae_metadata_mapping_file_09_28_17_SS.txt \
    --o-visualization regression_summary.qzv \
    --verbose

Error:

/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/regression/_ols.py:193: RuntimeWarning: invalid value encountered in true_divide
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)
/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/regression/_ols.py:192: RuntimeWarning: invalid value encountered in sqrt
/opt/conda/lib/python3.5/site-packages/statsmodels/stats/multitest.py:320: RuntimeWarning: invalid value encountered in less_equal
  reject = pvals_sorted <= ecdffactor*alpha
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-151>", line 2, in ols_regression
  File "/opt/conda/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/opt/conda/lib/python3.5/site-packages/qiime2/sdk/action.py", line 392, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/opt/conda/lib/python3.5/site-packages/q2_gneiss/regression/_regression.py", line 27, in ols_regression
    ols_summary(output_dir, res, tree)
  File "/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_regression_plot.py", line 291, in ols_summary
    hm_p = _heatmap_summary(model.pvalues.T, model.coefficients().T)
  File "/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_regression_plot.py", line 186, in _heatmap_summary
    ind = int(np.floor((x - _min) / (_max - _min) * (N - 1)))
ValueError: cannot convert float NaN to integer

Plugin error from gneiss:

  cannot convert float NaN to integer

See above for debug info.

One related question. The above code is using the full data set, but I had been trying to do this on a filtered subset of samples. I started by filtering my feature table, and thought this might be part of the problem when using an un-filtered metadata file. Should this approach work (assuming I can resolve the current issue)?

Thanks,
Seth

What happens if you just run formula=Fraction ?

Also, what does the balances.qza table look like? Are there a whole bunch of infs? It may be possible that the add-pseudocount command wasn’t run before hand, and a bunch of log(0) values were generated.

You can peak at the balances.qza using
qiime tools export balances.qza --output-dir balances_dir

I just tried using only Fraction, and I see a very similar error response.

/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/regression/_ols.py:193: RuntimeWarning: invalid value encountered in true_divide
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/opt/conda/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)
/opt/conda/lib/python3.5/site-packages/statsmodels/stats/multitest.py:320: RuntimeWarning: invalid value encountered in less_equal
  reject = pvals_sorted <= ecdffactor*alpha
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-151>", line 2, in ols_regression
  File "/opt/conda/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/opt/conda/lib/python3.5/site-packages/qiime2/sdk/action.py", line 392, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/opt/conda/lib/python3.5/site-packages/q2_gneiss/regression/_regression.py", line 27, in ols_regression
    ols_summary(output_dir, res, tree)
  File "/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_regression_plot.py", line 291, in ols_summary
    hm_p = _heatmap_summary(model.pvalues.T, model.coefficients().T)
  File "/opt/conda/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_regression_plot.py", line 186, in _heatmap_summary
    ind = int(np.floor((x - _min) / (_max - _min) * (N - 1)))
ValueError: cannot convert float NaN to integer

Plugin error from gneiss:

  cannot convert float NaN to integer

See above for debug info.

When I output my balances.qza into a biom file I do see many 0.0 values, but searching reveals no infs. I did run add-pseudocount on my table at the very beginning, as show in your tutorial write up. Just to be sure, I exported my composition.qza table and found a great deal of 1.0 counts, so it seems that the step must have worked.

Ok - I’m not sure about what your table looks like, so I am not completely sure what is going on here. Is it possible to rename the columns / rows and upload that so we can investigate further?

A couple of more questions

  1. Are there columns (i.e. balances) in the table that sum to zero?
  2. What does the distribution of Fraction look like?

But we have seen some edge cases where excessive number of low abundance species have caused issues with regression before. I haven’t personally seen it with OLS before, but it could also be worthwhile to perform filtering to remove some of the low abundance organisms.

2 Likes

Hi Jamie,

The sample names are coded, so I would be fairly comfortable sharing the tables. The balances.qza file is too large to upload in the comment. Is that unusual? I’ve gone ahead and shared on dropbox (https://www.dropbox.com/s/p6f0ynsqumsp1n5/balances.qza?dl=0).

  1. I summed all of the columns from my balances table, and did not find any zeros there either.
  2. The fraction parameter is a binary, and the samples are distributed nearly evenly between the two categories.

I can look into filtering the table next!

Thanks again.

1 Like

@Seth. Awesome – I think I have the problem figured out.

Turns out that you have a ton of balances with zero variance.

>>> import qiime2
>>> art = qiime2.Artifact.load('balances.qza')
>>> import pandas as pd
>>> balances = art.view(pd.DataFrame)
>>> np.sum(balances.var(axis=0)==0)
191

Zero variance balances will break any statistical tests you run. To demonstrate, I picked 'y3416', and generated some random covariate x

>>> from statsmodels.regression.linear_model import OLS
>>> y = balances['y3416']
>>> x = np.arange(len(y))
>>> model = OLS(y, x)
>>> model.fit()
>>> r = model.fit()
>>> r.summary()
                            OLS Regression Results
==============================================================================
Dep. Variable:                  y3416   R-squared:                         nan
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Tue, 03 Oct 2017   Prob (F-statistic):                nan
Time:                        20:29:44   Log-Likelihood:                    inf
No. Observations:                 410   AIC:                              -inf
Df Residuals:                     409   BIC:                              -inf
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1                  0          0        nan        nan           0           0
==============================================================================
Omnibus:                        1.013   Durbin-Watson:                     nan
Prob(Omnibus):                  0.602   Jarque-Bera (JB):              153.750
Skew:                           0.000   Prob(JB):                     4.11e-34
Kurtosis:                       0.000   Cond. No.                         1.00
==============================================================================

What is probably happening is that you are including a ton of features that are just not observed at all, but are still included in the table. We can run the pipeline and compute balances, but the regression results will be nonsensible - since we can’t actually compute statistics on organisms that are never present.

Filtering out all of these garbage features should be the solution to this problem – features should be seen at least once in your study. Be sure to checkout out the tutorial on how to do this!

6 Likes

Hooray, you’re a hero Jamie! This was the issue, and now it seems quite clear. I went ahead a ran the data through the remainder of your tutorial steps after filtering the low frequency features out of the input table. The results are quite interesting, and I’m excited to use this pipeline to test some of the questions we are working on here.

This all occurred because I was working from a table of samples filtered from the full data set, leaving behind features only found in some of the unused samples. I am trying to decide if working on the balances of a sub-set of samples is valid, or if I should handle these samples separately all the way from the start of the analysis.

Anyway, thanks again for all your help. I’m sure I will be back once I’ve broken your algorithms again!

Seth

4 Likes

Awesome! For future reference, I raised an issue here so this will be more intuitive to spot in the next release.

2 Likes

Just to add, the generated feature-table.biom can be observed using:
biom summarize-table -i balances_dir/feature-table.biom -o balances_dir/feature-table_summary.txt
:smiley:

1 Like

QIIME 2 2017.10 was released and has the improvements made to q2-gneiss to help with these errors!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.