How to get original data out of artifact? (i.e. what is the reverse of qiime2.Artifact.import_data()?)

Amanda_Birmingham · June 7, 2018, 7:54pm

My apologies if this question has already been addressed somewhere; I couldn't find an answer to it in the developer docs or a forum search.

I am working on adding some new functionality to a plugin; this new functionality calls an existing method in the plugin and then works with the results. The existing method generates a biom.Table object and uses qiime2.Artifact.import_data() to turn it into a FeatureTable[Frequency] artifact, which it then returns. However, for my added functionality, I need to get access to the underlying biom.Table object again.

Is there a straightforward way for me to get that back out of the FeatureTable[Frequency] artifact ... essentially, a reverse of qiime2.Artifact.import_data()? Or would I be better off doing something like re-architecting the existing method into two--an internal one that returns the biom.Table and an external one that turns that into an artifact--and having my new functionality call only the internal one?

ebolyen · June 8, 2018, 4:44pm

Hi @Amanda_Birmingham,

Awesome! Are you using Pipeline for this?

A method should be able to just return biom.Table which will become an artifact on the way out of the method.

You are looking for artifact.view() you in this example you would say:

biom_table = table_artifact.view(biom.Table)

You could also view it as other things like:

pandas_table = table_artifact.view(pd.DataFrame)

or even format objects:

biom_filepath = str(table_artifact.view(BIOMV210Format))  # imported from q2_types.feature_table

I would make sure your existing method is just returning a biom.Table, then you can use a pipeline to call that method and view the table however you need, this way the provenance/citations will be tracked the entire time

Amanda_Birmingham · June 12, 2018, 6:29pm

Wow, thank you for the helpful reply! I hope you'll indulge me in a couple of follow-up questions

Are you using Pipeline for this?

Yes, although based on some of your statements, I'm starting to wonder if we're using it optimally ... Specifically, I'm confused about your point that:

A method should be able to just return biom.Table which will become an artifact on the way out of the method.

How does that "become an artifact on the way out of the method" work? We thought we had to explicitly convert the biom table to a FeatureTable[Frequency] before returning it from our pipeline method, as shown below:

Our existing pipeline function (simplified):

  def fetch_amplicon(ctx, study_id):
        # various processing here
        mybiomtable = some_internal_method()
        mytree = some_other_internal_method()
        q_table = qiime2.Artifact.import_data('FeatureTable[Frequency]', mybiomtable)   
        q_tree = qiime2.Artifact.import_data('Phylogeny[Rooted]', tree)
        return q_table, q_tree

... and then in plugin_setup.py we define a Pipeline based on that function:

plugin.pipelines.register_function(
    function=ourplugin.fetch_amplicon,
    name='Fetch amplicon data',
    inputs={},
    parameters={
        'study_id': Str,
    },
    outputs=[
        ('feature_table', FeatureTable[Frequency]),
        ('phylogeny', Phylogeny[Rooted])
    ],
    input_descriptions={},
    parameter_descriptions={
        'study_id': 'The study to obtain',
    },
    output_descriptions={
        'feature_table': "A feature table of the sample data",
        'phylogeny': "A phylogeny relating the features"
    }

Is there a better way to be doing this?

Also, in this case, as you can see, we create two different kinds of data objects that go into the eventual output artifact. You mentioned that, if necessary, I can get the original biom table back out with

table_artifact.view(biom.Table)

Can you help me understand how view knows which of the multiple data objects in the artifact to grab and convert back to a biom.Table?

Thank you again for your time and help!

ebolyen · June 13, 2018, 11:42pm

Hi @Amanda_Birmingham!

Ah, yes that only applies for methods, pipelines are expected to receive and return artifacts (so that provenance can be accurately recorded when you compose methods together inside the pipeline). So I would say your understanding is correct, I just misunderstood.

A better way to do this would be:

def fetch_amplicon(ctx, study_id):
    some_method = ctx.get_action('your_plugin', 'some_method')
    results = some_method(...)   # provenance is tracked now
    my_biom_table = results.whatever_that_output_is_named  # this is an artifact

    # or if you just need the "import_data" functionality:
    # handles cleanup on failure, but is the same as `Artifact.import_data`
    feature_table = ctx.make_artifact('FeatureTable[Frequency]', biomtable)

There aren't actually any objects inside the artifact at all, instead there's just a file (or directory).
QIIME 2 then uses the class (biom.Table) that you pass to .view to invoke a transformer from the file/directory to an instance of that class. The q2-types plugin is mostly responsible for writing these transformers, but any plugin can do it as well.

Think of .view as a fancy factory where you just pass in the type you want and it constructs an instance of that type (assuming a plugin taught it how).