Juggle between semantic types and view types?

Dear all,

I am currently trying to develop my first QIIME2 plugin, and I admit having a hard time juggling between data types. I’d appreciate any guidance from someone more experienced with the API :slight_smile:

My aim is to automatically perform a series of analytical steps, the first one being to extract OTUs reference sequences from a specific taxa (I used Escherichia as an example in the following code). For this, I’d like to use the filter_seqs function from qiime2.plugins.taxa.methods.

In plugin_setup.py, I am registering my method the following way:

plugin.methods.register_function(
	function=extract_escherichia,	
		inputs={
		'rep_seqs': FeatureData[Sequence],
		'taxonomy': FeatureData[Taxonomy]
	},
	outputs=[('escherichia_seqs', FeatureData[Sequence])],
	...
	...
)

Now looking at the forum, I realized that I need to use view types in my function, which I consequently defined as such:

def extract_escherichia(rep_seqs: pd.Series, taxonomy: qiime2.Metadata) -> (pd.Series):

And here I am stuck to call filter_seqs in my method. Indeed, it seems that filter_seqs requires Semantic types as inputs, while I have Series & Metadata objects. Therefore the following call in my extract_escherichia method does not work:

from qiime2.plugins.taxa.methods import filter_seqs
def extract_escherichia(rep_seqs: pd.Series, taxonomy: qiime2.Metadata) -> (pd.Series):
    escherichia_seqs=filter_seqs(sequences=rep_seqs, taxonomy=taxonomy, include='Escherichia')
    return escherichia_seqs

Should I, and if so how, convert my objects back to Artifacts to use filter_seqs, or is there an easier way to do things which I did not get? I’m sure that it’s just a matter for me to better understand the API; I tried to look into the developers documentation, in existing plugins codes and directly in the forum but so far could not figure it out.

Thank you for your help!

1 Like

Hi @lea_si, this is such a great question, thanks!

Wow, from my perspective it looks like you actually have it all figured out, I think with a little bit of nudging in a new direction you'll have it all squared away in no time.

This is where things are breaking down. In QIIME 2, if you want to use another Action inside of your Action, then you need a Pipeline.

Importing a "vanilla" python function (that represents a QIIME 2 Action) has two disadvantages (over Pipelines):

  1. It is up to you to negotiate the appropriate view type transformation (as you have seen first-hand!). That is super annoying, and is one of the QIIME 2 Framework's main jobs, so might as well let QIIME 2 handle it...
  2. You lose all decentralized provenance of the imported function, since QIIME 2 doesn't "know" that you're using filter_seqs

Okay, so, converting your Method into a Pipeline, its pretty straightforward:

First, register extract_escherichia as a pipeline, rather than a method.

plugin.pipelines.register_function(
	...
)

Second, update your function's signature. Pipeline's have a mandatory first argument called ctx - this is the QIIME 2 context, and gives you access to things like other QIIME 2 actions, etc. Note, pipeline's don't use the view annotation syntax in the function signature

import pandas as pd
import qiime2
# note I am no longer importing `filter_seqs` using a Python import statement


def extract_escherichia(ctx, rep_seqs, taxonomy):
    # first, let's get a reference to filter_seqs:
    filter_seqs = ctx.get_action('taxa', 'filter_seqs')

    # since the rep_seqs and taxonomy are already Artifacts, we can just pass them right along:
    e_seqs_results = filter_seqs(sequences=rep_seqs, taxonomy=taxonomy, include='Escherichia')
    # the return value is a QIIME 2 Results object:
    # https://dev.qiime2.org/latest/api-reference/sdk/#qiime2.sdk.Results
    # it's just a namedtuple with your outputs in it
    # you can look up the name of the action's output directly:
    filtered_e_seqs = e_seqs_results.filtered_sequences

    # In your original function signature, you were requesting views of pd.Series and qiime2.Metadata,
    # so I'll assume you need those in that format for later code. Here is how you can do that:
    rep_seqs_series = rep_seqs.view(pd.Series)
    taxonomy_md = taxonomy.view(qiime2.Metadata)
    # now you can do things with the series and metadata in subsequent lines of code!
    
    # returning the filtered seqs, just to match what you had in your original example function
    return filtered_e_seqs

Hope that helps - keep us posted, really excited to see more of this plugin in the future!

:qiime2:

3 Likes

Dear @thermokarst,
Thank you so much for your detailed answer, I’ve totally overlooked the concept of pipelines which makes my life way easier! Your code example was perfect to follow, things work like a charm now (at least the first step of the plugin :wink: )
I won’t hesitate to reach out if I have more issues as I go along, and anyway to share when the plugin will be ready!

1 Like