Importing SampleData of Non-Specified Type

Hello, all!

I am developing a plugin that (among other things) I would like to be able to take two or more SampleData Artifacts and merge them together (basically a SQL join). However, I am unsure how to specify in the plugin setup that any Artifact of base SampleData should be a valid input (or if this is even supported in the underlying Q2 plugin architecture).

For example, SampleData[AlphaDiversity] & SampleData[LogRatios] should be a valid pair of inputs as well as SampleData[Sequences] & SampleData[SongbirdStats].

My current function looks like this:

plugin.methods.register_function(
    function=aggregate_sample_data,
    inputs={"foo": List[SampleData]},
    input_descriptions={"foo": "a"},
    outputs=[("aggregated_sample_data", SampleData[AggregatedSampleData])],
    output_descriptions={"aggregated_sample_data": "b"},
    description="c",
    name="Aggregate Sample Data",
)

This results in the following error:

Traceback (most recent call last):
  File "qiimecraft/plugin_setup.py", line 71, in <module>
    inputs={"sample_data": List[SampleData]},
  File "/miniconda3/envs/qiimecraft/lib/python3.6/site-packages/qiime2/core/type/grammar.py", line 171, in __getitem__
    % (field,))
TypeError: Field SampleData[{type}] is not complete type expression.

Is what I’m trying to do possible in the Qiime2 architecture? I know that qiime metadata tabulate can kind of accomplish this but there are a couple issues with that approach:

  1. My understanding is that this will filter the constituent inputs to their inner join - I’d like essentially an outer join where every sample is represented even if it is absent in other inputs.
  2. I would like the output of this function to be of type SampleData rather than a visualization.

Thanks!

3 Likes

Hi there @gibsramen!

Sweet! :icecream:

Cool, this makes sense - thanks for summarizing.

Metadata merging is a little bit different than a List input type, so let's ignore that case for now.

We don't support this, by design, because we think it is safer for plugin developers to specifically opt in to the variants it wishes to work with. For example - how do you intend to join SampleData[Sequences] & SampleData[SongbirdStats]? For the SampleData[Sequences] a record is identified by the combination of Sample ID && Sequence ID - so you would need to perform some kind of reduction to make that work.

So, how do you specify supporting many input variants? Like so:

    inputs={"foo": List[SampleData[AlphaDiversity |
                                   LogRatios |
                                   SongbirdStats]]},

The approach above uses the union operator, within the parent type specification. If you had an action that required a homogeneous list of SampleDatas, but it could be comprised of any variant, you could write it this way:

    inputs={"foo": List[SampleData[AlphaDiversity]] |
                        List[SampleData[LogRatios]] | 
                        List[SampleData[SongbirdStats]]},

Playing with this a bit with primitives, rather than semantic types might help disambiguate the differences:

from qiime2.plugin import List, Int, Str


assert [1] in List[Int]
assert ['foo'] not in List[Int]
assert [1, 2] in List[Int | Str]
assert [1, 'foo'] in List[Int | Str]
assert ['foo', 'bar'] in List[Int | Str]
assert ['foo', 'bar'] in List[Int] | List[Str]
assert [1, 2] in List[Int] | List[Str]
assert [1, 'bar'] not in List[Int] | List[Str]

The other thing to think about is you need an appropriate transformer defined from all of the "artifact formats" that are used to represent these types, and the "view" format you request in your method signature. I suggest requesting a pd.DataFrame in your view function, this should provide the most flexibility.

If you need clarification or help please don't hesitate to reach out - thanks!

:qiime2:

2 Likes

Thanks so much for the detailed reply!

This makes sense - I was thinking of things in terms of OOP where all SampleData artifacts could be encompassed. My thought was that anything that had an index of Sample IDs would be valid input (and all other columns would be aggregated) but I completely understand how the design of Q2 discourages/doesn't allow this.

I had a couple of specific use-cases in mind for the plugin so I will specify those in the union rather than try to incorporate everything :smile:

Thanks! Definitely something to keep in mind and using the pandas DataFrame makes a lot of sense.

I especially appreciate the inclusion of the primitives examples - make things a lot easier to understand! :grin:

1 Like