Resolving gneiss 'float NaN to integer' issue with mapping files

Hello,

I am using QIIME 2018.4 to analyze some human fecal samples. I want to associate taxa with host measurements using gneiss. However, I keep getting an error with gneiss when I run the following command:

    qiime gneiss ols-regression \
      --p-formula "TNFA1+hba1c+pyy_auc+Description+Race+Gender+Individual+Time+AgeAtEnrollment+L_M_ratio" \
      --i-table ./balances.qza \
      --i-tree ./hierarchy.qza \
      --m-metadata-file "./LSU mapfile Z and E batches combined July 2019 samples removed AB CD longitudinal.txt" \
      --o-visualization./regression_summary.qzv

This gives the error "cannot convert float NaN to integer"

However, when I ignore the column "L_M_ratio" in my formula it runs just fine:

qiime gneiss ols-regression \
  --p-formula "TNFA1+hba1c+pyy_auc+Description+Race+Gender+Individual+Time+AgeAtEnrollment" \
  --i-table ./balances.qza \
  --i-tree ./hierarchy.qza \
  --m-metadata-file "./LSU mapfile Z and E batches combined July 2019 samples removed AB CD longitudinal.txt" \
  --o-visualization ./regression_summary.qzv

When I run metadata tabulate on my mapping file and examine the L_M_ratio column, QIIME2 correctly identifies the values as numeric. I saw other topics regarding float NaN issue and it was claimed that having empty metadata cells would cause this issue. However, I have empty cells in my other metadata columns (including ones that are in the regression formula) and this has not caused any prior issues. I would rather not ignore the L_M_ratio column when building my gneiss regression model because it is a measure of gut permeability.

I have attached my mapping file, balances, and hierarchy files for reference. I want to know what is causing this issue with the L_M_ratio column specifically, and what I can do to address it. Thanks for any advice you can offer!

balances.qza (4.3 MB) hierarchy.qza (230.8 KB) LSU mapfile Z and E batches combined July 2019 samples removed AB CD longitudinal.txt (16.4 KB)

1 Like

Hello Zachary,

Welcome back to the forums.

I’m glad you got this formula working!

That’s correct. Or rather, it has to do with converting the NA values to other data types.

That makes sense… but how do you plan to fit a regression to samples that don’t have a variable?

One option is to drop samples that are missing this key value, then fit a regression to the remaining samples. Do you think that’s a good fit for you data?

Colin

Hi Colin,

Thanks for the reply!

That’s correct. Or rather, it has to do with converting the NA values to other data types.

This is what I'm confused about. In the gneiss formula that runs without any errors, I include the TNFA1, hba1c, and pyy_auc columns from my mapping file. All three of these columns have empty cells (missing values) in my mapping file. Why is the L_M_ratio column the only one that's causing this error?

I also want to point out that I generated the hierarchy file by running gradient-clustering on the TNFA1 column. I had to filter out samples 3477AB, 3477CD, and 3405AB first because I do not have TNFA1 measurements for them.

qiime gneiss gradient-clustering \
  --i-table ./merged_table_pre_post_pooled_metadata_only_pseudocount.qza \
  --m-gradient-file ./LSU_mapfile_Z_and_E_batches_combined_July_2019_samples_removed.txt \
  --m-gradient-column TNFA1 \
  --p-weighted \
  --o-clustering ./hierarchy.qza

One option is to drop samples that are missing this key value, then fit a regression to the remaining samples. Do you think that’s a good fit for you data?

I have 120 samples, and 20 are missing values in the L_M_ratio column. While removing them isn't ideal, it won't reduce the dataset too much. I'll give it a shot!

merged_table_pre_post_pooled_metadata_only_pseudocount.qza (472.7 KB)

1 Like