Help with making a plugin for core microbiome

(Richard Rodrigues) #1

Hi

I am trying to make a qiime2 plugin that implements the core microbiome from my last year’s paper (https://peerj.com/articles/4395/). Basically it uses presence/absence data in interest and out group and provides statistical significance of an OTU being a core microbiome.

I am using the tutorials about how to make a qiime2 plugin. Can we provide lists as inputs and outputs to methods inside the plugins. See below for example:

code

from typing import Iterable, List

def benjamini_hotchberg_correct(pvalues: List[float])-> List[float]:
    n = len(pvalues)
    ...
    return new_pvalues

in plugin_setup.py

plugin.methods.register_function(
    function=q2_coremicrobiome.benjamini_hotchberg_correct,
    inputs={},
    parameters={'pvalues': List},
    outputs=[('bhpvals', List)], 
    input_descriptions={
        
    },
    parameter_descriptions={
    'pvalues': ('pvalues on which to apply '
              'benjamini hotchberg correction.')
},
output_descriptions={'bhpvals': 'The pvalues after Benjamini Hotchberg correction.'}, 
name='Benjamini Hotchberg',
description=("Computes a benjamini hotchberg correction"
             " for pvalues."),
citations=[citations['Benjamini1995']])

Thanks.

-Rich

2 Likes
(Nicholas Bokulich) #2

Hi @Richard_Rodrigues1,

Thank you for putting together a plugin!

The short answer is yes, there’s a few ways to do this, but I have a few thoughts and questions.

  1. Most importantly, you cannot output a list of files, where the list is of variable length. Right now, the number of output files generated by QIIME 2 plugins must be fixed, though we are working on that (e.g., to allow optional output files).

  2. But you can have a variable number of input files if those files are a metadata type or transformable to metadata. QIIME 2 will automatically merge all metadata/columns into a single dataframe, and then you would write your function to operate across all columns. Assuming that ‘pvalues’ is vector of p-values (or potentially a list of vectors of pvalues), you could just do something like

inputs={'pvalues': Metadata}
  1. A p-value correction function seems more like a function that you would use internally (i.e., to automatically correct the p-values calculated by an action), rather than an action that you would expose independently in QIIME 2. Otherwise, you will need to define a new semantic type for a vector of pvalues (maybe SampleData[pvalue] or FeatureData[pvalue]?). Is this a function that you see people using independently of your plugin, or is this something that should just always be run with other actions in your plugin?

Does that all make sense? Let me know what you think.

1 Like
(Richard Rodrigues) #3

Hi Nicholas

Yes, it is helpful. However, I thought “every” method in the plugin had to be registered. I just see one main method that a user needs to call and the rest is supposed to be done by the internal methods (e.g., p-values correction and many other similar functions).

Would it be possible to have a 15-30 mins online meeting? I could show you what code I already have (https://github.com/richrr/coremicro), the basic qiime2 files I made (e.g. setup.py, plugin_setup.py) and you could suggest me the best way to convert my code to make a qiime2 plugin.

Appreciate your help!

-Rich

2 Likes
(Nicholas Bokulich) #6

:+1:
In general, you will expose functions that you want to be accessible to users, and then you can stitch together whatever functions you want under the hood. In your case, I expect you would want to expose a single visualizer.

Let me correct/clarify some earlier statements. You can actually use the List expression to collect many artifacts of any semantic type. See this example:

Also, the automatic merging of Metadata is a q2cli-specific feature, not a framework feature. If you are working with the Artifact API you will need to merge manually as exemplified here (this is what q2cli is actually doing under the hood to automatically merge metadata):

The next week is tight for me… but I could discuss the following week.

Now that you have setup.py, plugin_setup.py, the next thing to think about is:

  1. what functions do you want to expose as actions?
  2. what are the necessary input types? I assume you would take a feature table and maybe also metadata. That’s good, since it means you will not need to define new types (that can be a bit of a hassle). But what about the output? I assume you probably want a visualizer, which will contain plots and/or a table or results.
1 Like
(Richard Rodrigues) #8

The next week is tight for me… but I could discuss the following week.

Sure, we can schedule something for that week. You can email me your availability at [email protected] and we can find a mutually agreeable time.

what are the necessary input types? I assume you would take a feature table and maybe also metadata. That’s good, since it means you will not need to define new types (that can be a bit of a hassle).

Correct, feature table, metadata, and couple of (string, int, flag) arguments.

But what about the output? I assume you probably want a visualizer, which will contain plots and/or a table or results.

Yes, for now it is easiest to output table of core microbes for interest group. Later on, I can expand to output several tables and/or plots. One step at a time.

you will expose functions that you want to be accessible to users, and then you can stitch together whatever functions you want under the hood.

How do I do this? Majority of my functions don’t need to be exposed to the user.

Thanks.

-Rich

(Nicholas Bokulich) #9

Definitely a good plan to start simple. Note that outputting both tables (e.g., as artifacts) and plots (e.g., as visualizations) imply that you will need to make this a QIIME 2 pipeline. But don’t worry what that means at the moment…

Just check out any of the QIIME 2 plugins to see what I mean. Any function that is registered in plugin_setup.py will become exposed as actions. Those functions will most likely call other functions that are not registered and hence not exposed to users, but just run under the hood.

(Richard Rodrigues) #11

Ok I think I have what I need for now. Just to confirm, the under the hood functions don’t need to be registered in the plugin_setup.py, but the main input and output methods will have to be. Regardless, do all internal and registered functions need to use mypy syntax or just the registered ones?

Thanks.

-Rich

(Nicholas Bokulich) #12

Yes.

No, just registered functions.

2 Likes
(Richard Rodrigues) #13

Hi Nicholas

Sorry for the delayed reply. I think I have arranged all the code appropriately. I have a couple of questions/doubts related to the best way to provide specific arguments and overall code structure. I was wondering if we can setup a 15 mins online meeting (anytime works for me). It would be quicker and easily resolved via a short discussion rather than a chain of messages. Of course once we hash it out, I can post a summary of our Q&A here so other new developers can refer to it if needed.

Thanks.

-Rich

3 Likes