Support for scientific notation in filtering

Hello,

Not sure if this is technically a bug or missing feature, but scientific notation is not recognized when filtering from metadata. I've attached a simple metadata file for which I would expect the following two commands to give the same results

qiime feature-table filter-features \
  --i-table table.qza \
  --m-metadata-file filter.tsv \
  --p-where "Pval < .05 AND MeanDIff > 0" \
  --o-filtered-table filtered-table.qza

qiime feature-table filter-features \
  --i-table table.qza \
  --m-metadata-file filter.tsv \
  --p-where "NullHyp = 'Keep'" \
  --o-filtered-table filtered-table.qza

However, the very small p-value in scientific notation does not evaluate to less than .05 in the query. I was able to work around it by adding the (poorly named) NullHyp column, but was still bummed that I couldn't use the compound query.

Thanks!
Chris

filter.tsv (263 Bytes)

1 Like

@Chris_Hemmerich β€” you will need to cast your numeric data to numeric types in order to use those SQLite comparison operators. Check out this post for an example. Without a CAST, those values are actually being compared as strings β€” this means your greater than and less than operators are probably using the string length, instead of the numeric value. Give that a shot and let us know how it goes! :t_rex:

PS - you might also need to experiment with quoting your column names in the query β€” check out the sqlite.org docs for more info. Also, I didn’t see any examples of scientific notation in either of your examples, just ordinary real numbers in decimal representation.

Thanks, I will give this a shot and report back. The third data row is the scientific notation, R went a little nuts with significant digits. In my full data, the p-values are a mixture of decimal and scientific notation and the query works fine for the values in decimal notation, but I’ll get in the habit of using cast since this isn’t expected to work.

Thanks again,
Chris

Thanks for the clarification! SQLite supports scientific notation β€” my earlier post was just to make you aware of the fact that all the data is loaded initially as strings β€” right now it is up to you to tell SQLite what β€œtype” a particular column is. You should be able to leave the data formatted as-is and just update the queries to CAST those columns as numeric datatypes.


This is definitely something we need to document, and hopefully we will have better (more transparent) support for metadata typing in the future!

Worked like a charm with the CASTs, Thanks!

qiime feature-table filter-features \
  --i-table table.qza \
  --m-metadata-file filter.tsv \
  --p-where "CAST(Pval AS REAL) < .05 AND CAST(MeanDIff AS REAL) > 0" \
  --o-filtered-table filtered-table.qza
1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.