Help with making a plugin for core microbiome

Richard_Rodrigues1 · February 21, 2019, 5:35am

Hi

I am trying to make a qiime2 plugin that implements the core microbiome from my last year's paper (COREMIC: a web-tool to search for a niche associated CORE MICrobiome [PeerJ]). Basically it uses presence/absence data in interest and out group and provides statistical significance of an OTU being a core microbiome.

I am using the tutorials about how to make a qiime2 plugin. Can we provide lists as inputs and outputs to methods inside the plugins. See below for example:

code

from typing import Iterable, List

def benjamini_hotchberg_correct(pvalues: List[float])-> List[float]:
    n = len(pvalues)
    ...
    return new_pvalues

in plugin_setup.py

plugin.methods.register_function(
    function=q2_coremicrobiome.benjamini_hotchberg_correct,
    inputs={},
    parameters={'pvalues': List},
    outputs=[('bhpvals', List)], 
    input_descriptions={
        
    },
    parameter_descriptions={
    'pvalues': ('pvalues on which to apply '
              'benjamini hotchberg correction.')
},
output_descriptions={'bhpvals': 'The pvalues after Benjamini Hotchberg correction.'}, 
name='Benjamini Hotchberg',
description=("Computes a benjamini hotchberg correction"
             " for pvalues."),
citations=[citations['Benjamini1995']])

Thanks.

-Rich

Nicholas_Bokulich · February 21, 2019, 1:58pm

Hi @Richard_Rodrigues1,

Thank you for putting together a plugin!

The short answer is yes, there's a few ways to do this, but I have a few thoughts and questions.

Most importantly, you cannot output a list of files, where the list is of variable length. Right now, the number of output files generated by QIIME 2 plugins must be fixed, though we are working on that (e.g., to allow optional output files).
But you can have a variable number of input files if those files are a metadata type or transformable to metadata. QIIME 2 will automatically merge all metadata/columns into a single dataframe, and then you would write your function to operate across all columns. Assuming that 'pvalues' is vector of p-values (or potentially a list of vectors of pvalues), you could just do something like

inputs={'pvalues': Metadata}

A p-value correction function seems more like a function that you would use internally (i.e., to automatically correct the p-values calculated by an action), rather than an action that you would expose independently in QIIME 2. Otherwise, you will need to define a new semantic type for a vector of pvalues (maybe SampleData[pvalue] or FeatureData[pvalue]?). Is this a function that you see people using independently of your plugin, or is this something that should just always be run with other actions in your plugin?

Does that all make sense? Let me know what you think.

Richard_Rodrigues1 · February 21, 2019, 6:21pm

Hi Nicholas

Yes, it is helpful. However, I thought "every" method in the plugin had to be registered. I just see one main method that a user needs to call and the rest is supposed to be done by the internal methods (e.g., p-values correction and many other similar functions).

Would it be possible to have a 15-30 mins online meeting? I could show you what code I already have (GitHub - richrr/coremicro: Core microbiome), the basic qiime2 files I made (e.g. setup.py, plugin_setup.py) and you could suggest me the best way to convert my code to make a qiime2 plugin.

Appreciate your help!

-Rich

Nicholas_Bokulich · February 21, 2019, 7:56pm

In general, you will expose functions that you want to be accessible to users, and then you can stitch together whatever functions you want under the hood. In your case, I expect you would want to expose a single visualizer.

Let me correct/clarify some earlier statements. You can actually use the List expression to collect many artifacts of any semantic type. See this example:

github.com

qiime2/q2-feature-table/blob/dev/q2_feature_table/plugin_setup.py#L176


      
                  'axis': 'Along which axis to group. Each ID in the given axis must '
                          'exist in `metadata`.'
              },
              output_descriptions={
                  'grouped_table': 'A table that has been grouped along the given '
                                   '`axis`. IDs on that axis are replaced by values in '
                                   'the `metadata` column.'
              },
              name="Group samples or features by a metadata column",
              description="Group samples or features in a feature table using metadata "
                          "to define the mapping of IDs to a group.",
              examples={'group_samples': ex.feature_table_group_samples}
          )
          
          # maps input types to relevant overlap methods and output types
          i_table, p_overlap_method, o_table = TypeMap({
              (FeatureTable[Frequency],
               Str % Choices(sorted(q2_feature_table.overlap_methods() - {'union'}))):
              FeatureTable[Frequency],
              (FeatureTable[RelativeFrequency],
               Str % Choices(sorted(q2_feature_table.overlap_methods()

Also, the automatic merging of Metadata is a q2cli-specific feature, not a framework feature. If you are working with the Artifact API you will need to merge manually as exemplified here (this is what q2cli is actually doing under the hood to automatically merge metadata):

github.com

qiime2/q2cli/blob/1abf35a0353355fa92c7e5c7277d69861b999b8a/q2cli/handlers.py#L492


      
                              metadata.append(artifact.view(qiime2.Metadata))
                          except Exception as e:
                              header = ("There was an issue with viewing the artifact "
                                        "%s as QIIME 2 Metadata:" % path)
                              tb = 'stderr' if verbose else None
                              q2cli.util.exit_with_error(e, header=header,
                                                         traceback=tb)
                  if len(metadata) == 1:
                      return metadata[0]
                  else:
                      return metadata[0].merge(*metadata[1:])
          
          
          class MetadataColumnHandler(Handler):
              def __init__(self, name, repr, column_types, default=NoDefault,
                           description=None):
                  if default is not NoDefault and default is not None:
                      raise TypeError(
                          "The only supported default value for MetadataColumn "
                          "subclasses is `None`. Found this default value: %r"
                          % (default,))

The next week is tight for me... but I could discuss the following week.

Now that you have setup.py, plugin_setup.py, the next thing to think about is:

what functions do you want to expose as actions?
what are the necessary input types? I assume you would take a feature table and maybe also metadata. That's good, since it means you will not need to define new types (that can be a bit of a hassle). But what about the output? I assume you probably want a visualizer, which will contain plots and/or a table or results.

Richard_Rodrigues1 · February 22, 2019, 3:34am

The next week is tight for me… but I could discuss the following week.

Sure, we can schedule something for that week. You can email me your availability at dr.richrodrigues@gmail.com and we can find a mutually agreeable time.

what are the necessary input types? I assume you would take a feature table and maybe also metadata. That’s good, since it means you will not need to define new types (that can be a bit of a hassle).

Correct, feature table, metadata, and couple of (string, int, flag) arguments.

But what about the output? I assume you probably want a visualizer, which will contain plots and/or a table or results.

Yes, for now it is easiest to output table of core microbes for interest group. Later on, I can expand to output several tables and/or plots. One step at a time.

you will expose functions that you want to be accessible to users, and then you can stitch together whatever functions you want under the hood.

How do I do this? Majority of my functions don't need to be exposed to the user.

Thanks.

-Rich

Nicholas_Bokulich · February 22, 2019, 1:37pm

Definitely a good plan to start simple. Note that outputting both tables (e.g., as artifacts) and plots (e.g., as visualizations) imply that you will need to make this a QIIME 2 pipeline. But don't worry what that means at the moment...

Just check out any of the QIIME 2 plugins to see what I mean. Any function that is registered in plugin_setup.py will become exposed as actions. Those functions will most likely call other functions that are not registered and hence not exposed to users, but just run under the hood.

Richard_Rodrigues1 · February 22, 2019, 3:11pm

Ok I think I have what I need for now. Just to confirm, the under the hood functions don't need to be registered in the plugin_setup.py, but the main input and output methods will have to be. Regardless, do all internal and registered functions need to use mypy syntax or just the registered ones?

Thanks.

-Rich

Nicholas_Bokulich · February 22, 2019, 3:41pm

Yes.

No, just registered functions.

Richard_Rodrigues1 · April 11, 2019, 11:37pm

Hi Nicholas

Sorry for the delayed reply. I think I have arranged all the code appropriately. I have a couple of questions/doubts related to the best way to provide specific arguments and overall code structure. I was wondering if we can setup a 15 mins online meeting (anytime works for me). It would be quicker and easily resolved via a short discussion rather than a chain of messages. Of course once we hash it out, I can post a summary of our Q&A here so other new developers can refer to it if needed.

Thanks.

-Rich

Richard_Rodrigues · May 19, 2020, 8:46pm

Dear @Nicholas_Bokulich

Thanks for the suggestions and apologies for the delay in getting my code working!

I seem to have most of the code working as a q2 plugin. I am wondering about the following:

Is it possible to output (text and figures) files to a specific folder (via a user provided argument)? How would this argument be different than the default (miscellaneous) "--output-dir" argument of visualizers?
How do I make a qzv file for these output files?
Qiime's --output-dir arg doesn't seem to accept the user provided argument and creates a temporary folder using some random string. Does this only happen in development mode?
Once I feel the plugin is ready, how do I submit it for review/approval to be provided on qiime's plugin page?

Thanks.

-Rich

Nicholas_Bokulich · May 19, 2020, 10:17pm

Hi again @Richard_Rodrigues,

that's exciting news

It is better to output as QZA/QZV so that provenance is stored in these files.

You should check out the developer documentation, there is a great tutorial that's been added here that describes this:
https://dev.qiime2.org/latest/tutorials/

In a nutshell, you would output an html or other file type containing the plots/data you want to display and register your action as a visualizer:
https://dev.qiime2.org/latest/actions/visualizers/

The best way to see how it's done is to look at how other visualizers are registered in existing plugins. Take a look at the adonis action in q2-diversity, it is a relatively simple example to use as a template:

github.com

qiime2/q2-diversity/blob/30c4149400fa44abb409c3eb5e96a45daa3c36f5/q2_diversity/_beta/_visualizer.py#L345


      
              context = {
                  'table': table_html,
                  'sample_size': sample_size,
                  'mismatched_ids': mismatched_ids
              }
              index = os.path.join(
                  TEMPLATES, 'mantel_assets', 'index.html')
              q2templates.render(index, output_dir, context=context)
          
          
          def adonis(output_dir: str,
                     distance_matrix: skbio.DistanceMatrix,
                     metadata: qiime2.Metadata,
                     formula: str,
                     permutations: int = 999,
                     n_jobs: int = 1) -> None:
              # Validate sample metadata is superset et cetera
              metadata_ids = set(metadata.ids)
              dm_ids = distance_matrix.ids
              _validate_metadata_is_superset(metadata_ids, set(dm_ids))
              # filter ids. ids must be in same order as dm

that's right: this output_dir does not get exposed in any of the user interfaces. You need to register one or more outputs as separate arguments that do get exposed to the user (see the examples and documentation above for more details)

Release your plugin in its own github repository (or wherever you like), and document installation and usage instructions on the "library" to share it with the world: https://library.qiime2.org/

Good luck!

Richard_Rodrigues · May 20, 2020, 2:35pm

So just to make sure I understand correctly:
Anything written to the output_dir folder becomes part of the qzv file?
To make the qzv file, I need to make an index.html or index.tsv file and write to output_dir by using something like below?
q2templates.render(index, output_dir, context=context)
or
pairwise_path = os.path.join(output_dir, '%s-pairwise.csv' % method) pairwise_results.to_csv(pairwise_path)

Thanks

Nicholas_Bokulich · May 20, 2020, 5:40pm

Hi @Richard_Rodrigues,
First, I just want to clarify something I overlooked in my previous answer:

I assumed you were talking about the visualizer’s signature output_dir , but now realize you may have been talking about q2cli’s --output-dir. These two are not related — the command-line argument --output-dir is only exposed in q2cli, it allows the user to save all output artifacts/visualizations to that directory path instead of specifying individual filepaths, and this is only a feature of the CLI. My answer above still applies, but just to clarify that information was about the visualizer’s signature output_dir not q2cli...

Yes. Files saved there will be stored in the QZV, but this does not mean that it will necessarily be exposed to the visualization. So in your example this would render the contents of context to an HTML:

But this will not...

pairwise_path = os.path.join(output_dir, '%s-pairwise.csv' % method)
pairwise_results.to_csv(pairwise_path)

it will save this file inside the QZV, but you need to do more (e.g., create a link to download that CSV) to actually expose these data in your visualization. So note how on these lines we not only save that CSV but also render its contents as HTML:

github.com

qiime2/q2-diversity/blob/30c4149400fa44abb409c3eb5e96a45daa3c36f5/q2_diversity/_beta/_visualizer.py#L228-L234


      
              pairwise_path = os.path.join(
                  output_dir, '%s-pairwise.csv' % method)
              pairwise_results.to_csv(pairwise_path)
          
              pairwise_results_html = q2templates.df_to_html(pairwise_results)
          else:
              pairwise_results_html = None

Later in that function we write those results (and others) as "context" into the template index.html:

github.com

qiime2/q2-diversity/blob/30c4149400fa44abb409c3eb5e96a45daa3c36f5/q2_diversity/_beta/_visualizer.py#L255-L263


      
          q2templates.render(index, output_dir, context={
              'initial_dm_length': initial_dm_length,
              'filtered_dm_length': filtered_dm_length,
              'method': method,
              'group_rows': group_rows,
              'bootstrap_group_col_size': int(12 / row_count),
              'result': result_html,
              'pairwise_results': pairwise_results_html
          })

and in that index.html template we have both an insertion point for the rendered table (so that the table is displayed in the visualization), as well as a link to download the raw CSV:

github.com

qiime2/q2-diversity/blob/30c4149400fa44abb409c3eb5e96a45daa3c36f5/q2_diversity/_beta/beta_group_significance_assets/index.html#L42-L50


      
          {% if pairwise_results %}
            <div class="row">
              <div class="col-lg-12">
                <h2>Pairwise {{ method }} results</h2>
                <a href="{{ method }}-pairwise.csv">Download CSV</a>
                {{ pairwise_results }}
              </div>
            </div>
          {% endif %}

Let me know if that makes sense!

Richard_Rodrigues · May 20, 2020, 8:54pm

Yes, this is super helpful.

However, I am missing something obvious about the q2cli --output-dir and visualizers output_dir. If I access "output_dir" inside the visualizer function, I get the qiime generated string which is used to make the qzv.

But what method/function do I call to access the q2cli's argument --output-dir inside the code?

Thanks.

Nicholas_Bokulich · May 21, 2020, 2:05pm

None. Just pretend that --output-dir does not exist. The resulting CLI will automatically allow you save your visualizer's output as a QZV to a filepath (--o-visualization), or to an --output-dir. Those arguments are created by q2cli and not specified or accessed from within your code.

Richard_Rodrigues · October 2, 2020, 12:02am

Finally, I was able to make a qiime 2 plugin for it. https://library.qiime2.org/plugins/q2-coremicrobiome/29/

Special thanks to @Nicholas_Bokulich for the help in making the plugin and @cduvallet for her tutorial to convert the plugin to a conda package!

Nicholas_Bokulich · October 2, 2020, 5:40am

Great news @Richard_Rodrigues! I am looking forward to giving it a spin

SoilRotifer · October 2, 2020, 1:58pm

Nice work @Richard_Rodrigues!