Executable pipeline from artifact/visualization provenance

Hunter_Cameron · June 26, 2019, 8:17pm

I was wondering if it would be of interest to have the ability to extract the qiime2 commands needed to rerun a workflow from an artifact's proveneance? Does this already exist?

The envisioned use case is for reproducing workflows to apply to new datasets or to run similar analyses with modified parameters. For instance, if you supplied a PCoAResults artifact, it would output a pipeline that could be used to generate that PCoA (possibly with changing some of the parameters along the way).

This implies the existance of a Pipeline semantic type and also some new actions dumping a Pipeline from an artifact and for running a pipeline as loaded from the new Pipeline artifact. I'd imagine it could be nice to have the same functionality for visualizations as well.

It might take some thought for how to specify inputs for the pipeline (since it would be loaded as input for the run command) but it is something I wouldn't mind working on if this feature would be of enough use to the community.

ebolyen · July 1, 2019, 4:22pm

Hi @Hunter_Cameron!

It just so happens that @thermokarst and I are working on a fundamental part of this right now!

We are calling the feature the "Usage API", it will be a way to abstractly describe QIIME 2 actions, which will allow us to support inline usage examples, integration testing, multiple interface documentation, and provenance execution/templating!

Right now we're still ironing out the gritty details but we should have something merged into the master branch soon. Once that is in, we would love some help with provenance replay/parsing. The idea here would be some part of QIIME 2 would be responsible for reading provenance (straight-forward, but technically not attempted outside of q2view). Then it can dispatch Usage API calls to implentations, which would allow us to do anything from executing directly, to writing out CLI commands which are equivalent.

I think you've brought up some good points with respect to the need to describe alternative inputs, and that's an area I haven't personally thought too much about yet.

One idea would be to have some kind of "parsed pipeline" object and then you "subtract out" inputs (not sure how to reference a specific input, but maybe UUID?).

I think this feature would be huge, and we would of course love the help. Let me know if you'd like to schedule a call (maybe in August?, we're a little scattered from travel this month).

In the meanwhile, this thread could be a great place to discuss potential APIs for the end-user.