Support other metadata merging strategies

Currently metadata files are combined with an inner join, which means that any records not present in all metadata files will be dropped. It would be useful if this could be a configurable parameter.

The use-case is to visualize feature importances in Empress together with taxonomy data, and other feature-level annotations. When features are filtered, this results in the dataset shrinking in size to only include the shared set.

cc @thermokarst @andrewsanchez @ebolyen @fedarko

12 Likes

Hi Everyone, I just hit this issue myself, and three years on from it being posted I wonder whether anyone has looked at it or come up with a workaround?

I mean, we could tell the users to export their various forms of metadata, load them in a spreadsheet, sort them by id, then clumsily cut-and-paste them together. Or if they're programmers they could load them in dataframes and merge them properly. Neither solution is good.

2 Likes

I'm not aware of any "easy" workarounds to this problem. In the context of Empress, we ended up adding a short section to our README that explains the situation and links to a small Python script that can be used to merge taxonomy and feature importance metadata files with an outer (?) join; this script is really just following the "load them in dataframes and merge them properly" idea you suggested.

I can imagine two types of options for more elegantly handling this situation (but there are probably other solutions I'm missing :slight_smile: ):

  1. Automatically adding extra parameters to every action / visualizer command within QIIME 2 that accepts Metadata; these parameters would allow users to specify how they want to merge metadata files of a given "type" (e.g. --m-sample-metadata) if multiple are provided for a single type (e.g. --p-sample-metadata-merge-strategy).

  2. Providing a command within QIIME 2 (e.g. in qiime metadata) that merges metadata files, with the parameter above as an option. (This might get tricky, because IIRC QIIME 2 commands currently can't output metadata files.)

Hey all,
We do now have a way to output metadata in an artifact with the semantic type ImmutableMetadata (see 2023.5 release notes), and this can pave the way to supporting other metadata merging strategies through specific commands as @fedarko suggests.

ImmutableMetadata is viewable as qiime2.Metadata, so can be used anywhere that metadata can be in QIIME 2, and it can be exported which results in a plain, old (mutable) metadata .tsv file. ImmutableMetadata will of course have provenance associated with it, and the .tsv of course won't, so working downstream with the ImmutableMetadata artifact is preferred where possible for maintaining the provenance chain.

We also have a long-standing PR that addresses another issue: merging metadata with overlapping column names, which isn't possible right now.

Let me check in with the folks on my team and I'll follow up here with some notes on our plan for moving this forward.

1 Like

We had some discussion internally, and I updated this issue on the topic. The idea here is that we'll create a simple merge action in q2-metadata, and we can expand on that over time (or add new special case actions as needed).

3 Likes

Brilliant! Thanks @gregcaporaso. I look forward to trying it out.

1 Like