Help with making a plugin for core microbiome

Hi

I am trying to make a qiime2 plugin that implements the core microbiome from my last year’s paper (https://peerj.com/articles/4395/). Basically it uses presence/absence data in interest and out group and provides statistical significance of an OTU being a core microbiome.

I am using the tutorials about how to make a qiime2 plugin. Can we provide lists as inputs and outputs to methods inside the plugins. See below for example:

code

from typing import Iterable, List

def benjamini_hotchberg_correct(pvalues: List[float])-> List[float]:
    n = len(pvalues)
    ...
    return new_pvalues

in plugin_setup.py

plugin.methods.register_function(
    function=q2_coremicrobiome.benjamini_hotchberg_correct,
    inputs={},
    parameters={'pvalues': List},
    outputs=[('bhpvals', List)], 
    input_descriptions={
        
    },
    parameter_descriptions={
    'pvalues': ('pvalues on which to apply '
              'benjamini hotchberg correction.')
},
output_descriptions={'bhpvals': 'The pvalues after Benjamini Hotchberg correction.'}, 
name='Benjamini Hotchberg',
description=("Computes a benjamini hotchberg correction"
             " for pvalues."),
citations=[citations['Benjamini1995']])

Thanks.

-Rich

2 Likes

Hi @Richard_Rodrigues1,

Thank you for putting together a plugin!

The short answer is yes, there's a few ways to do this, but I have a few thoughts and questions.

  1. Most importantly, you cannot output a list of files, where the list is of variable length. Right now, the number of output files generated by QIIME 2 plugins must be fixed, though we are working on that (e.g., to allow optional output files).

  2. But you can have a variable number of input files if those files are a metadata type or transformable to metadata. QIIME 2 will automatically merge all metadata/columns into a single dataframe, and then you would write your function to operate across all columns. Assuming that 'pvalues' is vector of p-values (or potentially a list of vectors of pvalues), you could just do something like

inputs={'pvalues': Metadata}
  1. A p-value correction function seems more like a function that you would use internally (i.e., to automatically correct the p-values calculated by an action), rather than an action that you would expose independently in QIIME 2. Otherwise, you will need to define a new semantic type for a vector of pvalues (maybe SampleData[pvalue] or FeatureData[pvalue]?). Is this a function that you see people using independently of your plugin, or is this something that should just always be run with other actions in your plugin?

Does that all make sense? Let me know what you think.

1 Like

Hi Nicholas

Yes, it is helpful. However, I thought “every” method in the plugin had to be registered. I just see one main method that a user needs to call and the rest is supposed to be done by the internal methods (e.g., p-values correction and many other similar functions).

Would it be possible to have a 15-30 mins online meeting? I could show you what code I already have (https://github.com/richrr/coremicro), the basic qiime2 files I made (e.g. setup.py, plugin_setup.py) and you could suggest me the best way to convert my code to make a qiime2 plugin.

Appreciate your help!

-Rich

2 Likes

:+1:
In general, you will expose functions that you want to be accessible to users, and then you can stitch together whatever functions you want under the hood. In your case, I expect you would want to expose a single visualizer.

Let me correct/clarify some earlier statements. You can actually use the List expression to collect many artifacts of any semantic type. See this example:

Also, the automatic merging of Metadata is a q2cli-specific feature, not a framework feature. If you are working with the Artifact API you will need to merge manually as exemplified here (this is what q2cli is actually doing under the hood to automatically merge metadata):

The next week is tight for me... but I could discuss the following week.

Now that you have setup.py, plugin_setup.py, the next thing to think about is:

  1. what functions do you want to expose as actions?
  2. what are the necessary input types? I assume you would take a feature table and maybe also metadata. That's good, since it means you will not need to define new types (that can be a bit of a hassle). But what about the output? I assume you probably want a visualizer, which will contain plots and/or a table or results.
1 Like

The next week is tight for me… but I could discuss the following week.

Sure, we can schedule something for that week. You can email me your availability at [email protected] and we can find a mutually agreeable time.

what are the necessary input types? I assume you would take a feature table and maybe also metadata. That’s good, since it means you will not need to define new types (that can be a bit of a hassle).

Correct, feature table, metadata, and couple of (string, int, flag) arguments.

But what about the output? I assume you probably want a visualizer, which will contain plots and/or a table or results.

Yes, for now it is easiest to output table of core microbes for interest group. Later on, I can expand to output several tables and/or plots. One step at a time.

you will expose functions that you want to be accessible to users, and then you can stitch together whatever functions you want under the hood.

How do I do this? Majority of my functions don't need to be exposed to the user.

Thanks.

-Rich

Definitely a good plan to start simple. Note that outputting both tables (e.g., as artifacts) and plots (e.g., as visualizations) imply that you will need to make this a QIIME 2 pipeline. But don't worry what that means at the moment...

Just check out any of the QIIME 2 plugins to see what I mean. Any function that is registered in plugin_setup.py will become exposed as actions. Those functions will most likely call other functions that are not registered and hence not exposed to users, but just run under the hood.

Ok I think I have what I need for now. Just to confirm, the under the hood functions don’t need to be registered in the plugin_setup.py, but the main input and output methods will have to be. Regardless, do all internal and registered functions need to use mypy syntax or just the registered ones?

Thanks.

-Rich

Yes.

No, just registered functions.

2 Likes

Hi Nicholas

Sorry for the delayed reply. I think I have arranged all the code appropriately. I have a couple of questions/doubts related to the best way to provide specific arguments and overall code structure. I was wondering if we can setup a 15 mins online meeting (anytime works for me). It would be quicker and easily resolved via a short discussion rather than a chain of messages. Of course once we hash it out, I can post a summary of our Q&A here so other new developers can refer to it if needed.

Thanks.

-Rich

3 Likes

Dear @Nicholas_Bokulich

Thanks for the suggestions and apologies for the delay in getting my code working!

I seem to have most of the code working as a q2 plugin. I am wondering about the following:

  • Is it possible to output (text and figures) files to a specific folder (via a user provided argument)? How would this argument be different than the default (miscellaneous) “–output-dir” argument of visualizers?

  • How do I make a qzv file for these output files?

  • Qiime’s --output-dir arg doesn’t seem to accept the user provided argument and creates a temporary folder using some random string. Does this only happen in development mode?

  • Once I feel the plugin is ready, how do I submit it for review/approval to be provided on qiime’s plugin page?

Thanks.

-Rich

Hi again @Richard_Rodrigues,

:tada:

that's exciting news

It is better to output as QZA/QZV so that provenance is stored in these files.

You should check out the developer documentation, there is a great tutorial that's been added here that describes this:
https://dev.qiime2.org/latest/tutorials/

In a nutshell, you would output an html or other file type containing the plots/data you want to display and register your action as a visualizer:
https://dev.qiime2.org/latest/actions/visualizers/

The best way to see how it's done is to look at how other visualizers are registered in existing plugins. Take a look at the adonis action in q2-diversity, it is a relatively simple example to use as a template:

that's right: this output_dir does not get exposed in any of the user interfaces. You need to register one or more outputs as separate arguments that do get exposed to the user (see the examples and documentation above for more details)

Release your plugin in its own github repository (or wherever you like), and document installation and usage instructions on the "library" to share it with the world: https://library.qiime2.org/

Good luck!

So just to make sure I understand correctly:
Anything written to the output_dir folder becomes part of the qzv file?
To make the qzv file, I need to make an index.html or index.tsv file and write to output_dir by using something like below?
q2templates.render(index, output_dir, context=context)
or
pairwise_path = os.path.join(output_dir, '%s-pairwise.csv' % method) pairwise_results.to_csv(pairwise_path)

Thanks

Hi @Richard_Rodrigues,
First, I just want to clarify something I overlooked in my previous answer:

I assumed you were talking about the visualizer’s signature output_dir , but now realize you may have been talking about q2cli’s --output-dir. These two are not related — the command-line argument --output-dir is only exposed in q2cli, it allows the user to save all output artifacts/visualizations to that directory path instead of specifying individual filepaths, and this is only a feature of the CLI. My answer above still applies, but just to clarify that information was about the visualizer’s signature output_dir not q2cli...

Yes. Files saved there will be stored in the QZV, but this does not mean that it will necessarily be exposed to the visualization. So in your example this would render the contents of context to an HTML:

But this will not...

pairwise_path = os.path.join(output_dir, '%s-pairwise.csv' % method)
pairwise_results.to_csv(pairwise_path)

it will save this file inside the QZV, but you need to do more (e.g., create a link to download that CSV) to actually expose these data in your visualization. So note how on these lines we not only save that CSV but also render its contents as HTML:

Later in that function we write those results (and others) as "context" into the template index.html:

and in that index.html template we have both an insertion point for the rendered table (so that the table is displayed in the visualization), as well as a link to download the raw CSV:

Let me know if that makes sense!

2 Likes

Yes, this is super helpful.

However, I am missing something obvious about the q2cli --output-dir and visualizers output_dir. If I access “output_dir” inside the visualizer function, I get the qiime generated string which is used to make the qzv.

But what method/function do I call to access the q2cli’s argument --output-dir inside the code?

Thanks.

None. Just pretend that --output-dir does not exist. The resulting CLI will automatically allow you save your visualizer's output as a QZV to a filepath (--o-visualization), or to an --output-dir. Those arguments are created by q2cli and not specified or accessed from within your code.

Finally, I was able to make a qiime 2 plugin for it. https://library.qiime2.org/plugins/q2-coremicrobiome/29/

Special thanks to @Nicholas_Bokulich for the help in making the plugin and @cduvallet for her tutorial to convert the plugin to a conda package!

4 Likes

Great news @Richard_Rodrigues! I am looking forward to giving it a spin :smile:

Nice work @Richard_Rodrigues!