Gneiss OLS NaN Error

HLaue · October 17, 2017, 4:33pm

Hello,

I am running gneiss on a small data set and I am running into similar issues as Seth (Gneiss Plugin Error - #24 by mortonjt), but my problem is different enough that I thought it warranted a separate post.

When I create balances based on data that were sent by the sequencing core removing singletons and chimeras I get around 11,000 balances, and I'm able to run ols-regression no problem. A lot of the balances that were significant in my results seemed to be driven by one or two samples, so I thought I would further filter my feature table to features that were detected in three or more samples (rather than two or more). I am able to create balances (~7000), but when I attempt to run ols-regression, I get an error that reads "cannot convert float NaN to integer".

I've attached the error file
qiime2-q2cli-err-p464se4_.txt (2.2 KB)

And you can link to the balance file that seems to be causing problems here
https://www.dropbox.com/s/zcsf6okmbk2csf1/FiltBalances.qza?dl=0

Thanks for your help!

mortonjt · October 17, 2017, 5:14pm

Hi @HLaue,

Its possible that you are actually experiencing the same problem, but the variances within the groups are zero. An easy way to test this is to isolate the problematic balances by running OLS / anova on each balance, and verifying that they do have zero variance within one of the groups. A workaround this would be to filter out low abundance OTUs (i.e.anything <10 read total).

Could you also link the mapping file to confirm this?

HLaue · October 17, 2017, 7:15pm

Thanks for getting back to me so quickly!

I've attached the mapping file for your reference. I also tried filtering out <10 reads (and <100 just to follow the line of thought) and get the same error.

Thanks for your help!
Hannah
Meta3.tsv (1.7 KB)

mortonjt · October 17, 2017, 7:36pm

Looks like there are over twice as many samples in the balances compared to the metadata, so those need to be matched up first.

>>> import pandas as pd
>>> metadata = pd.read_table('Meta3.tsv', dtype=object)
>>> import qiime2
>>> balances = qiime2.Artifact.load('FiltBalances.qza').view(pd.DataFrame)
>>> balances.shape
# 19 samples
>>> metadata.shape
(19, 8)
# 42 samples
>>> balances.shape
(42, 6981)

# Do some data massaging, to infer the right index type
>>> metadata = metadata.set_index('#SampleID')
>>> from gneiss.util import match
>>> balances, metadata = match(balances, metadata)

When we look at the balances at the reduced set, there are actually quite a few zero variances.

>>> balances.var(axis=0).sort_values().head()
y6638    0.0
y1390    0.0
y6387    0.0
y6388    0.0
y5088    0.0

So it does look like it is an identical to the previous post, once the sample ids between the metadata and balances are matched up. I recommend first looking at the index-based filtering tutorial before doing abundance based filtering.

HLaue · October 17, 2017, 8:25pm

Thanks! Filtering by ID and features worked!

system · November 18, 2017, 2:25am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.