Us too! That is why QIIME 2 has a Semantic Type system (while extending that idea into things like Formats and transformers)!
I suspect I haven't done a good job explaining my proposal of identifier based filtering, or how it would work, under-the-hood so to speak, because I think we are actually on the same page when it comes to some of the mechanics of how this could work.
Right now, qiime feature-table filter-features
has an optional parameter for metadata-based filtering:
--m-metadata-file MULTIPLE PATH
Metadata file or artifact viewable as
metadata. This option may be supplied
multiple times to merge metadata. Feature
metadata used with `where` parameter when
selecting features to retain, or with
`exclude_ids` when selecting features to
discard. [optional]
So, you can use a traditional Metadata TSV file here, or, you can provide an "artifact viewable as metadata". The first option (TSV-style) is pretty clear how that works, I think, but the second is a little more interesting to me. Artifacts viewable as metadata retain their semantic type, but through the transformation system, are viewed by the filter-features
method as Metadata! Nothing has been converted or modified of the user's original data.
Here is what that looks like right now:
Filtering with a traditional TSV metadata file:
qiime feature-table filter-features \
--i-table table.qza \
--m-metadata-file feature-metadata.tsv \
--o-filtered-table filtered-table.qza
Filtering with a FeatureData[Taxonomy]
artifact (this is currently supported, because the format that represents this type is viewable as metadata):
qiime feature-table filter-features \
--i-table table.qza \
--m-metadata-file taxonomy.qza \
--o-filtered-table filtered-table.qza
This is still only a one-step command for the user, there is no need for them to "convert" their taxonomy data beforehand - the type system, transformer system, and formats, all know how to work together with the view API to make this happen! In the plugin, the registered method's signature looks for qiime2.Metadata
, so it receives a consistent object every time.
So, if we defined a transformer for for converting a phylogeny format to Metadata:
@plugin.register_transformer
def _1(data: NewickFormat) -> qiime2.Metadata:
data = _util_to_load_and_convert_tree_to_table(data)
df = pd.Dataframe(data)
# The df index would be the tip IDs
return qiime2.Metadata(df)
The transformer above would basically do what you proposed above:
plus, whatever else might make sense generally.
Then, any user interested in filtering their feature table based on the IDs present in a phylogenetic tree (Phylogeny[Rooted | Unrooted]
) could run the following:
qiime feature-table filter-features \
--i-table table.qza \
--m-metadata-file tree.qza \
--o-filtered-table filtered-table.qza
So that would be a one-stop-shop for them, they would get to retain the tree, untouched, but, the filter-features
method would be able to grab the IDs out of tree in a consistent manner. Plus, Phylogeny[Rooted | Unrooted]
artifacts would now generally be viewable as metadata, which means that other methods that can consume metadata for their work (often utilizing IDs for coordination) can now take advantage of this! This also means that there is only one place in the code that is responsible for creating a dataframe of tip IDs, rather than implementing in individual methods. Transformers are global in the QIIME 2 ecosystem.
I hope I have made my proposal a bit more clear, but if not, @ebolyen can probably help answer any more questions or concerns! Thanks for entertaining this discussion!