Make conda package for MetaPhlAn2 plugin

plugin-development
install

(Francesco Asnicar) #1

Hi all,

Thanks to your support in this other discussion (Callable does not have parameter(s)) I was able to finalize the plugin for MetaPhlAn2. I successfully tested and here you can find the Bitbucket repository: https://bitbucket.org/biobakery/metaphlan2-install

After cloning the repo, you can install MetaPhlAn2 with:
python setup.py install
and then
qiime dev refresh-cache
to make QIIME2 aware of the new plugin. After this, you should find the metaphlan2 plugin with two functions: profile-single-fastq and profile-paired-fastq.

If you have problems running it, it might be that you have to grant executable permissions to the metaphlan2.py script and add it to the PATH environment variable, something like (with the correct paths) should do the job:
export PATH=/.../path/to/.../metaphlan2-install/build/lib/metaphlan2:/.../path/to/.../metaphlan2-install/build/lib/metaphlan2/utils:$PATH
chmod +x /.../path/to/.../metaphlan2-install/build/lib/metaphlan2/metaphlan2.py

The last missing thing now (I think) is how to officially register this new plugin within the QIIME2 system. I know I should build a conda package, but if anyone can help/guide me to finalize this, I would really appreciate it.

Many thanks,
Francesco


Error raised during denosing step with DADA2
New plugin q2-pear - review and assistance needed
(Evan Bolyen) #3

Hey @fasnicar!

This is awesome news! Congratulations on getting your plugin working!

I’d be more than happy to help you with this part! I believe the path we are looking to take is to turn your plugin into Community Plugin and then we’ll proceed from there bringing it into the default installation hopefully sometime early 2018.

Looking at your conda recipe and setup.py file I think there’s still a bit of work to be done to make it play nicely with conda. But I think you definitely have the right idea in general!

Something to note, conda expects to be “in charge” of all dependencies and their versions, which let it determine if a given set of packages has a dependency resolution order. This helps us ensure all of the plugins can inter-operate with their respective libraries.

It looks like you ran into the install_requires problem here. We tend to solve this in core by actually dropping our dependencies from setup.py and instead listing them in the recipe. Since you have a repo dedicated to basically building a conda-recipe for MetaPhlAn2, this shouldn’t be too much of an issue. But other projects may prefer to keep their dependencies in setup.py which is also ok. This recipe is a pretty good example, you can ignore the top section, it’s just some templating we use as part of our automated build system.

Your build script section in the recipe might also be simplified if you leveraged scripts= in setup.py (here’s an example). This should also set the right permissions so that you can execute things.

Additionally, since it looks like you have a git submodule in the repo, you could probably reference that instead of downloading the source again.

I think sorting out the dependencies and making sure all of the right files are baked into the .tar.gz is the next step. What kinds of dependencies do you guys have for MetaPhlAn2? We have a particular channel order I can describe, but I suspect most things can be found in the bioconda channel.

Let me know if that’s enough to get started, or if you need more help!


(Francesco Asnicar) #5

Hi @ebolyen,

Many thanks for your suggestions. I’m not sure I’ve been able to satisfy all of them, though!

It took me a bit, but I now have the conda package of MetaPhlAn2.
You can find it here: https://anaconda.org/fasnicar
And, the commands I used in QIIME2 VM for installing it are:

$ conda install bowtie2 -c bioconda
$ conda install -c fasnicar metaphlan2

I’m not sure why I have to manually install bowtie2, as I thought that the MetaPhlAn2 package would have this information stored within it, but apparently not, or maybe there is another way to specify this while building the package.

I tried to install the MetaPhlAn2 package on my QIIME2 virtual machine but without success. I’m seeing this error:

(qiime2-2017.9) [email protected]:~$ qiime dev refresh-cache 
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/dev.py", line 27, in refresh_cache
    import q2cli.cache
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/cache.py", line 302, in <module>
    CACHE = DeploymentCache()
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/cache.py", line 61, in __init__
    self._state = self._get_cached_state(refresh=refresh)
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/cache.py", line 107, in _get_cached_state
    self._cache_current_state(current_requirements)
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/cache.py", line 200, in _cache_current_state
    state = self._get_current_state()
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/cache.py", line 238, in _get_current_state
    plugin_manager = qiime2.sdk.PluginManager()
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/plugin_manager.py", line 44, in __new__
    self._init()
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/plugin_manager.py", line 58, in _init
    plugin = entry_point.load()
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2316, in load
    return self.resolve()
  File "/home/qiime2/miniconda/envs/qiime2-2017.9/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2322, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
ImportError: No module named 'metaphlan2.plugin_setup'; 'metaphlan2' is not a package

This is strange as I with the code in the metaphlan2-install repository (https://bitbucket.org/biobakery/metaphlan2-install) I’m able to register the plugin. Does anyone have an idea about what I’m doing wrong with the conda package?

Many thanks for all your help,
Francesco


(Evan Bolyen) #7

Hi @fasnicar!

Your channel doesn’t have bowtie2 in it (and neither does defaults), but if you run this:

conda install -c fasnicar -c bioconda metaphlan2

It works like you expect.

Opening an ipython session, it looks like the metaphlan2.py script is overriding the package that you have isntalled:

In [1]: import metaphlan2

In [2]: metaphlan2
Out[2]: <module 'metaphlan2' from '/home/evan/.conda/envs/q2-test/bin/metaphlan2.py'>

Python’s package resolution rules are pretty obscure and always surprise me, but this is probably why it works from source, but not once installed. Source installations use a different kind of file-name/structure for package discovery via site-packages (and also wouldn’t have installed the script into your environment yet either). As for why it even has /bin/ in sys.path for Python package lookup is absolutely beyond me.

We want to see something like this:

<module 'metaphlan2' from '/home/evan/.conda/envs/q2-test/lib/python3.5/site-packages/metaphlan2/__init__.py'>

I think the script is registered here. Your options are to change the script name (probably not a great idea since that is your API), or to change the package name so that they don’t overlap.

Hope that helps!


(Francesco Asnicar) #9

Many thanks @ebolyen for the very useful tips.

I tried building several different conda packages for MetaPhlAn2, but I’m still experiencing the same error when trying to add it into QIIME2.

You can see the two packages (metaphlan2 and q2-metaphlan2) I built here: https://anaconda.org/fasnicar/dashboard

I would really like to have the conda package working into QIIME2, but right now I don’t know what else I could try to fix the above error. If anyone from the QIIME2 developers is willing to look into this issue, here there is the repository that I’m using for building the conda package: https://bitbucket.org/biobakery/metaphlan2-install and the command that I’m issuing for the actual buildins is: conda build conda-recipe --python 3.5 -c bioconda

Any help will be appreciated.
Many thanks again for the support,
Francesco


(Evan Bolyen) #11

Hey @fasnicar,

I think I might have a bit of time to look at your recipes later this week. I’ll let you know what I find out.


(Evan Bolyen) #12

Hey @fasnicar,

I’ve got bad and worse news:

The bad news:

There doesn’t seem to be an easy way out of the sys.path problem. So you’ll probably need to change the name of your package. This isn’t so hard to do with setup.py:

# add this param to setup():
package_dir={'metaphlan2_pkg': 'metaphlan2'}

and update the entry point to look like:

entry_points={"qiime2.plugins": ["metaphlan2=metaphlan2_pkg.plugin_setup:plugin"]},

The worse news:

This is probably going to ruin almost everything in some way, so it might take a lot of trial and error (or be unfeasible entirely).

Is it possible to run the metaphlan2 plugin without any of the scripts? If those could be excluded then we won’t have this issue.

The worst news:

It turns out you don’t see this sys.path behavior when you are running python, you only notice it if you are in the middle of an entry-point invocation (such as when we load plugins) or if you are running ipython for some reason. I think this behavior is to tie together some of the magic that are console_scripts entry-points allowing the “shim-script” that get’s registered to /bin/ to see what it’s supposed to actually call, but I’m not certain. This is a “simple” reference for how sys.path is populated.

This leads me to an idea, but I have no clue if it will work.
What if instead of registering metaphlan2.py and company using the scripts=[...] argument, you set a console_scripts entry-point?

It looks like you use a nice __main__ guard in metaphlan2.py, so you should be able to say something like:

"console_scripts":[
    "metaphlan2.py=metaphlan2.metaphlan2:metaphlan2",
    "strainphlan.py=metaphlan2.strainphlan:strainphlan"
]

(These would be added to the same entry_points dictionary that qiime2.plugins is in.)

Like I said, I’m not sure if that will fix it, but it might give us a chance to avoid the name-shadowing that is happening between metaphlan2/__init__.py and bin/metaphlan2.py.

Sorry I don’t have anything better to report.


(Francesco Asnicar) #14

Hi @ebolyen,

Many thanks for taking the time to look at the MetaPhlAn2 package, I really appreciated.
Unfortunately, I haven’t had the time yet to try your suggestion, but I should have some time soon to work again on this and let’s hope also have good news! :slight_smile:

Many thanks again,
Francesco


(Nicholas Bokulich) #15

@fasnicar I followed the installation instructions above and gave this a spin.

As @ebolyen noted above, this is a naming issue:

Since you said that you tried @ebolyen’s recommendations above and reached a dead end, it seems that the easiest alternative is to just rename the package (e.g., to “q2-metaphlan2”) and package directory (e.g., “q2_metaphlan2”). Since this is the plugin package, I assume it does not matter what you call it.

By renaming those, I was able to get this up and running:

$ qiime q2-metaphlan2 profile-paired-fastq --help
Usage: qiime q2-metaphlan2 profile-paired-fastq [OPTIONS]

  MetaPhlAn is a computational tool for profiling the composition of
  microbial communities (Bacteria, Archaea, Eukaryotes, and Viruses) from
  metagenomic shotgun sequencing data with species level resolution

Options:
  --i-raw-data ARTIFACT PATH SampleData[PairedEndSequencesWithQuality]
                                  metagenomic shotgun sequencing data
                                  [required]
  --p-nproc INTEGER               The number of CPUs to use for parallelizing
                                  the mapping, default 1 (no parallelization)
                                  [default: 1]
  --o-biom-table ARTIFACT PATH FeatureTable[Frequency]
                                  TAB-separated text file containing relative
                                  abundances of the species found in the input
                                  [required if not passing --output-dir]
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config PATH               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --quiet                         Silence output if execution is successful
                                  (silence is golden).  [default: False]
  --citations                     Show citations and exit.
  --help                          Show this message and exit.

I do not see any unit tests, though (and I don’t have any test data for manual testing) so can’t test in other ways.

Would that solution work for you?


(Francesco Asnicar) #16

Many thanks @Nicholas_Bokulich, yes that solution perfectly work for us!

To test the package you can get a (down-)sampled fastq file from the HUMAnN2 tutorial: demo.fastq

I don’t have access right now to the Qiime2 VM where I have the exact commands I used, but you should be able to make an artifact with:

$ echo -e "sample-id,absolute-filepath,direction\ndemo.fastq,$PWD/demo.fastq,forward" > demo.manifest
$ qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path demo.manifest --output-path demo.qza --source-format SingleEndFastqManifestPhred33

And then you can run MetaPhlAn2 with the following command:

$ qiime q2-metaphlan2 profile-paired-fastq --i-raw-data demo.qza --o-biom-table demo_profile.biom --p-nproc 4 --verbose

Many thanks,
Francesco


(Evan Bolyen) #17

This is super exciting!!


(Francesco Asnicar) #18

Hi @Nicholas_Bokulich and @ebolyen, (I think) I have good news!

So, I update the qiime2 recipe for MetaPhlAn2 in the metaphlan2-install repository.

With the recipe in the qiime2-recipe folder, I was able to package the q2-metaphlan2 (uploaded for the moment on my Anaconda.org profile: https://anaconda.org/fasnicar/).

Just to make it clearer here the steps I followed:

  • Cloning the repository and building the package
hg clone https://bitbucket.org/biobakery/metaphlan2-install
cd metaphlan2-install
conda build qiime2-recipe/ --python 3 -c bioconda
anaconda upload /path/.../to/.../q2-metaphlan2.tar.bz2
conda install q2-metaphlan2 -c fasnicar -c bioconda
qiime dev refresh-cache
  • Getting a sample .fastq file and test MetaPhlAn2
wget https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/humann2/input/demo.fastq
echo -e "sample-id,absolute-filepath,direction\ndemo.fastq,$PWD/demo.fastq,forward" > demo.manifest
qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path demo.manifest --output-path demo.qza --input-format SingleEndFastqManifestPhred33
qiime q2-metaphlan2 profile-single-fastq --i-raw-data demo.qza --o-biom-table demo_profile.biom --p-nproc 4 --verbose

So, I think I can prepare a tutorial for MetaPhlAn2 in the next days to describe the installation process and an example to test the installation.

Is there something else that I should do with/for the q2-metaphlan2 package to have it available within Qiime2 :qiime2:?

Many thanks for all your support in developing this package!


(Nicholas Bokulich) #19

Thanks @fasnicar! This is really exciting. I will keep a look out for the tutorial and give it a spin when you have it posted.

I think having this conda installable is all we need to have it available… @ebolyen anything else?

If I recall correctly, it did not look like there were any unit tests — this of course is not an availability concern, but would be useful for making sure everything is working correctly in case anything changes in :qiime2: or in one of metaphlan’s dependencies…

Thanks!


(Greg Caporaso) #20

I had some discussion offline with @fasnicar about this. I tried to install on macOS, and that doesn’t seem to be working (I think maybe there is no macOS build on the specified channels). Here’s the command I ran and the resulting error message.

$ conda install q2-metaphlan2 -c fasnicar -c bioconda

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - q2-metaphlan2

...  

@fasnicar, I think you had some questions about creating macOS conda packages - do you want to ask those here?


(Francesco Asnicar) #21

Many thanks @gregcaporaso for following up on this!

I actually just found and tried one solution that seemed to work, but I’m not sure if this is the best way to get a MacOS compatible package for MetaPhlAn2.

This is what I tryied:

conda convert --platform osx-64 q2-metaphlan2-1.0.4-py35h39e3cac_0.tar.bz2 -o ./

I then installed Qiime2 and the above converted q2-metaphlan2 package (uploaded to my anaconda.com profile) on a MacOS and I was able to successfully run the example I provided above (just edited because I noticed that qiime tools import changed from --source-format to --input-format).

Is there a better way to obtain a MacOS package for q2-metaphlan2 or this is the right way to get it?

Many thanks!


(Evan Bolyen) #22

Assuming there aren’t any compiled dependencies (e.g. it’s pure python), that process should work and is likely much more reliable than it might have been a year or two ago. I’d say use it if it’s working!


(Francesco Asnicar) #23

Awesome! :slight_smile:
So, I’ll go on with the Plugin page and hopefully that will be public within this week!

Many thanks everyone for all the help with the development of this plugin!


(Greg Caporaso) #24

@fasnicar, I can confirm that the installation is working for me now on macOS - awesome! When you post the plugin to the QIIME 2 Library, it would be ideal if you could provide a small input file that users could use to test/experiment with. I’d like to try it out, but I don’t have a small demultiplexed metagenomic data set handy.

Thanks again for your work on this - I’m excited to use it, and to have it as part of the QIIME 2 ecosystem!


(Francesco Asnicar) #25

Hi @gregcaporaso, awesome!

I just finishing the Plugin page for q2-metaphlan2 with a small example that can run in less than 2 minutes.

If you should find any problem and/or error please let me know so that I can fix them.

Many thanks, Francesco