Are NON-sample-id column names case-insensitive in metadata files?

Hello, Folks,
I am working with some metadata provided by an external source (i.e., please don't blame me for its funkiness :slight_smile: ) and I ran across a little difficulty using it in a Metadata object. The metadata file has multiple data columns that differ only by case, and although the I don't get any errors/warnings when loading the file into a Metadata object, I hit a Duplicate column name error when I try later to select ids based on one of those columns. I've recreated a minimal example with a foo column and a FOO column (see below).

Based on this, I am concluding that while some sample id column names can be case-sensitive in a Metadata file (e.g., #SampleID, sample_name), the data column names are handled as case-insensitive. (I looked in the Metadata in QIIME 2 — QIIME 2 2023.9.2 documentation but didn't see info about this specifically.) Could you let me know if I'm I correct about this?

Thank you!

Example:
minimal_metadata.tsv:

sample-id	foo	FOO
a	b	c
d	e	f

Attempted code (in conda qiime2-dev environment):

import qiime2
md = qiime2.Metadata.load("minimal_metadata.tsv")
md.get_ids("foo='b'")

Traceback (most recent call last):
  File "/Applications/miniconda3/envs/qiime2-dev/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-d295b00d9a66>", line 1, in <module>
    md.get_ids("foo='b'")
  File "/Users/abirmingham/Work/Repositories/fork_qiime2/qiime2/metadata/metadata.py", line 683, in get_ids
    self._dataframe.to_sql('metadata', conn, index=True,
  File "/Applications/miniconda3/envs/qiime2-dev/lib/python3.8/site-packages/pandas/core/generic.py", line 2987, in to_sql
    return sql.to_sql(
  File "/Applications/miniconda3/envs/qiime2-dev/lib/python3.8/site-packages/pandas/io/sql.py", line 695, in to_sql
    return pandas_sql.to_sql(
  File "/Applications/miniconda3/envs/qiime2-dev/lib/python3.8/site-packages/pandas/io/sql.py", line 2187, in to_sql
    table.create()
  File "/Applications/miniconda3/envs/qiime2-dev/lib/python3.8/site-packages/pandas/io/sql.py", line 838, in create
    self._execute_create()
  File "/Applications/miniconda3/envs/qiime2-dev/lib/python3.8/site-packages/pandas/io/sql.py", line 1871, in _execute_create
    conn.execute(stmt)
sqlite3.OperationalError: duplicate column name: FOO
pyver = sys.version_info
print('Python version: %d.%d.%d' % (pyver.major, pyver.minor, pyver.micro))
Python version: 3.8.18

print('QIIME 2 release: %s' % qiime2.__release__)
QIIME 2 release: 2023.11

print('QIIME 2 version: %s' % qiime2.__version__)
QIIME 2 version: 2023.11.0.dev0+12.gc4ec793.dirty
1 Like

Hello @Amanda_Birmingham,

It looks like this is happening because of sqlite which uses case-insensitive column names. You don't get the error in load because the metadata isn't put into a sqlite database until get_ids is called. Perhaps this is something that we should check for during instantiation. At any rate it seems pretty silly to have column names that differ only in their case, which is probably why we don't have an explicit check for this.

Thank you :slight_smile:

it seems pretty silly to have column names that differ only in their case

No argument here! And yet, still not the silliest thing I've seen in real metadata :smiley:

Perhaps this is something that we should check for during instantiation

That would be very useful for this situation (although I understand probably a pretty low priority). Maybe a small mention about this on the metadata documentation page, near the info about the case requirements for the sample id column, would be an easy addition.

Thanks again for confirming my understanding on this!

2 Likes

The documentation suggestion is great idea (especially because it's so low effort)! Would you like to open a GitHub issue? Otherwise I can and will link to this forum post to give you credit.

@colinvwood , sure! I've opened a GitHub issue on the qiime2/docs repo with a suggestion for what might be added (see Describe column name case insensitivity in metadata tutorial · Issue #577 · qiime2/docs · GitHub). Hopefully this is more or less what you had in mind :slight_smile:

Interestingly, some digging determines that there did used to be a case-insensitive check on column name uniqueness in the Metadata object, but it was removed at user request in a long-ago PR (MAINT: remove case-insensitive duplicates check for IDs and column names by jairideout · Pull Request #364 · qiime2/qiime2 · GitHub) somewhat before the get_ids() method was added. (I guess there's no pleasing everyone :wink: ). Still, it might be something to consider reinstating at some point.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.