Using the qiime2 api inside of jupyterhub fails with when external command line applications are called.
I realize that some of the issue here lies with jupyterhub (the below code works in an ipython terminal), however it would be nice if the full functionality of qiime2 was available with jupyterhub. I do not understand what is happening behind the scenes to understand why qiime2 is unable to find external programs in jupyterhub but not ipython or command line.
from qiime.plugins import alignment
from q2_types import FeatureData, Sequence, DNAIterator, FeatureTable, Frequency
seqs = (skbio.DNA(e) for e in ['ACGATCGAT', 'CAGCTAGCAT'])
dna_iter = DNAIterator(seqs)
a = qiime.Artifact.import_data(FeatureData[Sequence], dna_iter)
aligned_seqs = alignment.methods.mafft(a)
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: mafft --preservecase /tmp/qiime2-archive-7r8aosse/8468a5c2-85c2-4d23-9ceb-b45076ffca23/data/dna-sequences.fasta
FileNotFoundError: [Errno 2] No such file or directory: 'mafft'
This is not the full traceback, though I can post that if it would be useful. Any help getting this running would be appreciated.
Hi @John_Chase! It looks like your instance of
jupyterhub has a different
$PATH environment variable set compared to when you launch an
ipython session. The
$PATH variable is responsible for telling your shell where to search for executables when you run a command like
$PATH variable is missing the specific path that
mafft is installed to, then that would explain why you are seeing that particular error. While you aren’t directly running
mafft, QIIME 2 is, on your behalf. Unfortunately I think the solution is pretty dependent on how you are launching
jupyterhub, and how it is configured.
A few (possibly) relevant threads on GH:
Ultimately, it looks like if you launch
jupyterhub in a shell that has the appropriate
jupyterhub should honor those settings. Where it gets a little more hazy is if you have any configuration around the
Spawner concept in
jupyterhub. At first glance this does not seem to be a problem that can be solved within QIIME 2, but rather is something specific to your site deployment of
jupyterhub. It would be helpful to figure this out that way we can document on https://docs.qiime2.org, though!
If you want to provide a bit more info about how you are starting up
jupyterhub, that would be useful.
Also, can you include the output from the following (3) commands:
# from your shell that you launch ipython/jupyterhub from
$ which mafft
# within a jupyter session and also again within an ipython session
$ which mafft
from the environment in which jupyterhub is being served will not return anything as mafft is not installed there. It is installed using the conda command from the qiime2 docs in a conda env.
I will investigate the links that you included further.
At first glance this does not seem to be a problem that can be solved within QIIME 2, but rather is something specific to your site deployment of jupyterhub.
This is unfortunate. I think my concern about this comes from the main complaint about qiime1 being installing and using the dependencies; conda and the plugin system have made this infinitely better in qiime2, though it is still frustrating to use an API that will work in some situations but not others particularly because it is not clear which parts of qiime will fail. It may be a necessary evil of depending on non-python packages however it does seem odd to be using an api that is issuing command line calls. Perhaps a note in the installation that certain functionality will need to have manually defined paths? If I am the only user who runs into the issue then it is likely not worth changing anything, though.
The way to handle this is to set up a custom kernel for the conda environment. Jupyter allows you to set environment variables via the
"env" key. What you would want to do is just set the
PATH to your conda environment’s
bin directory. For example you might prepend something like this directory path:
/home/evan/.conda/envs/qiime/bin to what is already in
Let me know if you need more assistance with this. We have a very similar setup locally for automatically generating kernels from conda environment (with the correct path) for JupyterHub.
Thanks for following up!
I put together a brief MWE:
$ npm install -g configurable-http-proxy
$ mkdir jupyterhub-demo
$ cd jupyterhub-demo
$ conda create -n jupyterhub-mwe -c qiime2 python=3.5 qiime
$ source activate jupyterhub-mwe
$ conda install matplotlib==1.5.1
$ conda install -c bioconda -c qiime2 -c biocore scikit-bio==0.5.1 mafft q2-alignment
$ pip install jupyterhub
$ jupyterhub --no-ssl
Then launch your browser and hit
localhost:8000. I think the default for the config uses your unix account credentials. Once logged in, create a new notebook:
Make sure to select the new conda env we just created.
Once that loads, you should be able to run the example you provided earlier:
Grabbing my $PATH info:
The noteworthy item there is the very first entry, that is the bin dir for the new conda env. Listing the contents at that location:
$ ls /home/matthew/miniconda3/envs/jupyterhub-mwe/bin/m*
Also worth noting is the impact on your
$PATH env var when sourcing a conda env vs not:
# no conda env sourced
$ which mafft
mafft not found
# with the conda env that we installed mafft to
$ source activate jupyterhub-mwe
(jupyterhub-mwe) $ which mafft
The aim of this MWE is to demonstrate that QIIME2 does work right now with jupyterhub, but is subject to the configuration of your environment (in your case, the environment is jupyterhub) so that is where we might need to do some inspection/tweaking. So, if you want to double check and run the commands I asked about earlier (
which mafft, and the
os.environ stuff in ipython to compare with your jupyterhub results), that would be pretty helpful in diagnosing the situation.
My issue is that we are not running jupyterhub from inside of a conda environment that can be changed. Each user is in charge of maintaining their own environments, and is not able to make changes to the environment where jupyterhub is running.
@ebolyen do you know what specifically needs to be added to the kernel?
Currently my path is specified as:
Marking @thermokarst’s post as the solution. He gave a working example of basic Jupyterhub configuration with QIIME 2. It looks like the issues @John_Chase is having are related to site-specific Jupyterhub configuration and not an issue with QIIME 2.