Guidance on typing for functions underlying plugin actions?

Amanda_Birmingham · February 23, 2024, 6:01pm

Hi, All,
I am working on a plugin, and I am struggling a little with the definition of the function underlying one of my plugin actions (note: NOT the action itself--I feel like I have a pretty clear idea of how to define the types of the action inputs and outputs using SemanticTypes). I've described details of my efforts at the bottom, but I think my basic question is: when I am defining a function underlying an action, what kinds of "types" are legal to specify for the input and output parameters?

I failed when specifying typing-library-based types (like Dict[str, float]) for outputs, and when specifying QIIME built-in types (like qiime2.plugin.List[qiime2.plugin.Str]). However, I succeeded using python native types like dict or other package's types like pandas.DataFrame, as well as any object type I defined myself (e.g., class LinearRegressionsYamlFormat(model.TextFileFormat):). I think my mental model of what can be used here is lacking Can anyone point me to a resource to set me straight, or educate me here?

Many thanks,
Amanda

Gory Details:
In my case, I have all the plugin chrome set up and the action defined:

plugin.methods.register_function(
    function=q2_pysyndna.fit,
<etc>

... and I am working on the definition of the fit function. I originally had it defined like this:

from typing import Dict, List, Union

def fit(syndna_concs: pandas.DataFrame,
        syndna_counts: biom.Table,
        metadata: Metadata,
        min_sample_count: int = 1) -> \
        (Dict[str, Union[Dict[str, float], None]], List[str]):
    
        # for testing only
    linregs_dict = {"example_test": {
        "slope": 1.24675913604407,
        "intercept": -7.155318973708384,
        "rvalue": 0.9863241797356326,
        "pvalue": 1.505381146809759e-07,
        "stderr": 0.07365795255302438,
        "intercept_stderr": 0.2563956755844754}}
    log_msgs_list = ["test log message 1"]

    return linregs_dict, log_msgs_list

However, when I tried to run the action, this failed way down in qiime2.core.type.signature in the coerce_given_outputs function, where it complained that the type of the output_view (dict) is not the same as the spec.view_type (Dict[str, Union[Dict[str, float], None]]) (see
qiime2/qiime2/core/type/signature.py at d56401b26a2230a9acbba2ec3a8b398e52e934b5 · qiime2/qiime2 · GitHub ). Testing showed that replacing typing-library-based hint in the fit definition above with (dict, list) worked. (I was a little surprised, as I was able to successfully use typing hints from the typing library in other parts of the plugin--for example, I was able to define a transformer that worked on a manually-created dict like the one above using the definition def _2(data: Dict[str, Union[Dict[str, float], None]]) -> LinearRegressionYamlFormat: <etc>.) After this, I thought that maybe I should be using QIIME types as outputs, so (for testing purposes), I ditched the dict part of the output and just tried defining a return value for a list of strings as qiime2.plugin.List[qiime2.plugin.Str], but that produced the same type(output_view) != spec.view_type error.

gregcaporaso · February 27, 2024, 3:59pm

Hi @Amanda_Birmingham,
When defining functions that are registered to actions (fit, in your example), the types that you associate with the inputs and outputs are Python data types, and more specifically Python data types that are associated with the semantic type of that input or output by transformers.

Quick explanation of the motivation: This enables "viewing" artifacts as different data types internally - i.e., what the data represents (its semantic type) is decoupled from how we interact with it (its data type). For example, if you have a FeatureTable[Frequency] as input to an action (eg), that can be viewed as a biom.Table object or a pandas.DataFrame object, and the developer may choose between the two of those based on available functionality, efficiency for an operation they plan to carry out, their familiarity with the object, or whatever else. This also leaves the door open for semantic types to be viewed as new data types (e.g., by adding a transformer to a new, futuristic superfast.Table object), while not requiring all actions that take a FeatureTable[Frequency] as input to update their function to use the superfast.Table rather than the pandas.DataFrame or biom.Table API they were previously using.

I used your question as an opportunity to port some related content that I had written over to Developing with QIIME 2 (i.e., the new developer documentation), so you can now find a little more discussion about this here.

Amanda_Birmingham · March 1, 2024, 10:26pm

Thank you, @gregcaporaso, the article is helpful--as you note, the word 'type' is quite overloaded! The word 'format' is also doing a lot of work, which I think adds to my confusion

When defining functions that are registered to actions ..., the types that you associate with the inputs and outputs are ... Python data types that are associated with the semantic type of that input or output by transformers.

This is very useful guidance. So, my current understanding is that when I am defining the inputs and outputs for a function that will be registered as an action:

I can use data types (e.g., in-memory representations of data) like pandas.DataFrame or biom.Table (assuming the linked input in the action is of a SemanticType that can be transformed into those).
I can also use data types that are "Formats" in the QIIME sense (not in the file sense). For, example, I see the function underlying the q2-demux.subsample_single action takes in a SingleLanePerSampleSingleEndFastqDirFmt
I can also use QIIME "primitive types"? (e.g. I have seen examples of functions underlying actions where these functions take inputs of the data type qiime2.Metadata, although maybe this is relevant only to function inputs that come in via action 'parameters' and not those that come in via action 'inputs'?)
I CANNOT use SemanticTypes. As I understand it, SemanticTypes define kinds of data, and are instantiated in memory (e.g., UchimeStats = SemanticType('UchimeStats') ), but don't actually hold any data themselves ... which I guess would be a good reason I can't use them to pass data into a function
I CANNOT use (data?) types from the python typing library, such as Union[Dict[str: float]. Maybe this is because I cannot actually instantiate an object that is of the type Dict[str: float]? (Like, my_dict = {"blue": 3.5} qualifies as a Dict[str: float] but that is not its actual data type, which is dict.)

Any further corrections to my understanding would be very much welcome and appreciated. I am grateful to the QIIME team for all the support you folks provide!

gregcaporaso · March 6, 2024, 4:25pm

The word 'format' is also doing a lot of work

Good point - I'll keep that in mind as I'm writing Developing with QIIME 2.

All of the points you mention are correct. I'll provide a little bit of clarification on some (if I don't mention one it's b/c I have nothing to add to your description).

I can also use data types that are "Formats" in the QIIME sense

Yes, exactly - these are useful when you want to process a file within your action (e.g., iterate over the lines in a fastq file, as in your example).

I can also use QIIME "primitive types"... maybe this is relevant only to function inputs that come in via action 'parameters' and not those that come in via action 'inputs'

Yes, exactly. That's a distinction between 'parameters' and 'inputs': parameters use primitive types while inputs are always artifacts and therefore have semantic types associated with them.

I CANNOT use (data?) types from the python typing library

Correct. I chatted with @ebolyen about this and his perspective is that typing was changing a lot when we were working on this, so didn't prioritize it then, and it hasn't surfaced as a priority since it's stabilizing.

I am grateful to the QIIME team for all the support you folks provide!

Thank you!