In [1]: import qiime2
In [2]: qiime2.__version__
Out[2]: '2022.8.3'
and I'm having trouble understanding how the default_missing_scheme and column_missing_schemes parameters work in qiime2.Metadata.load method. By the way, I was using https://dev.qiime2.org/latest/api-reference/metadata/ as a reference and the file I'm using to test this functionality is: example.txt (232 Bytes).
As expected, when using the default, everything is categorical:
In [3]: qiime2.Metadata.load('example.txt')
Out[3]:
Metadata
--------
6 IDs x 4 columns
categorical: ColumnProperties(type='categorical', missing_scheme='blank')
numeric: ColumnProperties(type='categorical', missing_scheme='blank')
mixed: ColumnProperties(type='categorical', missing_scheme='blank')
other: ColumnProperties(type='categorical', missing_scheme='blank')
Call to_dataframe() for a tabular representation.
but I would have expected that using default_missing_scheme='INSDC:missing' would detect that cells with not provided are blanks and columns with those values and numeric will be autodetected as numeric columns but it doesn't:
In [4]: qiime2.Metadata.load('example.txt', default_missing_scheme='INSDC:missing')
Out[4]:
Metadata
--------
6 IDs x 4 columns
categorical: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')
numeric: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')
mixed: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')
other: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')
Call to_dataframe() for a tabular representation.
I tried using column_missing_schemes but not sure if it should be combined with default_missing_scheme of used individually.
I think your expectation is correct, those should have become numeric automatically. I'm going to double check things on our end (I was pretty sure we tested exactly this), but in the meantime could you send a representative file with this situation? sorry will check your example, thanks for providing!.
Re:
Yep! Those are designed to be used together, the default_missing_scheme will be a fallback for anything not explicitly mentioned by column_missing_scheme.
The precedence order (from greatest to least) should be:
column_missing_scheme > within file annotation q2:missing > default_missing_scheme
Yeah, this appears to be a bug in the precedence order.
Setting the q2:missing in the file works as expected, but you should have been able to override the interpretation with default_missing_scheme. I'm going to look into the logic a bit more to see if this is quick to fix.
There is logic for handling the cast of the column with knowledge of the missing scheme. It just turns out that default_missing_scheme was left out of the party.
That means that initially, the following hack will work (it involved a double-load unfortunately):
In [3]: cols = Metadata.load('Downloads/example.txt').columns
In [4]: Metadata.load('Downloads/example.txt',
column_missing_schemes={
c: 'INSDC:missing' for c in cols})
Out[4]:
Metadata
--------
6 IDs x 4 columns
categorical: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')
numeric: ColumnProperties(type='numeric', missing_scheme='INSDC:missing')
mixed: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')
other: ColumnProperties(type='categorical', missing_scheme='INSDC:missing')