Mafft /dev/stderr permission denied under jupyter

jamesabbott · January 31, 2019, 1:18pm

I'm having trouble running the align_to_tree_mafft_fasttree pipeline using the artefact API in a jupyter notebook.

Here is a minimal example using the moving pictures example data which demonstrates the problem.

import os
import sys

from qiime2 import Artifact
from qiime2 import Metadata
from qiime2.plugins import dada2
from qiime2.plugins import demux
from qiime2.plugins import feature_table
from qiime2.plugins import phylogeny

os.chdir('/cluster/db/jabbott/qiime2_reanalysis/qiime2-moving-pictures-api')
try:
    seqs = Artifact.import_data('EMPSingleEndSequences', 'emp-single-end-sequences',view_type='EMPSingleEndDirFmt')
except qiime2.plugin.ValidationError as e:
    print('An error occured during import: %s' % e)
    sys.exit(1)
except Exception as e:
    print('An unexpected error has occured: %s' % e)
    sys.exit(1)
    
metadata = Metadata.load('sample-metadata.tsv')
demuxed=demux.methods.emp_single(seqs=seqs,barcodes=metadata.get_column('BarcodeSequence'))
denoised=dada2.methods.denoise_single(demultiplexed_seqs=demuxed.per_sample_sequences, 
                                 trunc_len=110,trim_left=0,n_threads=1)
phyl=phylogeny.pipelines.align_to_tree_mafft_fasttree(sequences=denoised.representative_sequences)

which when run in a jupyter notebook results in the following being reported

/homes/jabbott/miniconda3/envs/qiime2-2018.11/bin/mafft: line 911: /dev/stderr: Permission denied
/homes/jabbott/miniconda3/envs/qiime2-2018.11/bin/mafft: line 1949: /dev/stderr: Permission denied

along with a stack trace in the jupyter window which has the underlying cause of:

CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '/tmp/3555549.1.m600.q/qiime2-archive-vmm8ryrx/69b619f8-7d85-4362-8aac-05d717bee055/data/dna-sequences.fasta']' returned non-zero exit status 1

If I export the notebook to a script and run it directly under the same environment, it runs correctly, I'm suspecting there is something strange going on with how jupyter is handling stderr.

This is with qiime2-2018.11 under CentOS 6.10, installed via conda. Jobs are run under Univa Grid Engine with jupyter started from a qrsh session, however the jupyter process is running under my uid so this shouldn't (!) affect things....

Anyone have any ideas of how to work round this?

Many thanks
James

ebolyen · January 31, 2019, 9:07pm

Hi @jamesabbott,

This is a really interesting issue, and I can't say I really have an answer yet, but you mention this:

It may be possible that the qrsh session is doing something like an su command, which could cause an error with the file descriptors used as described in this SO answer.

I don't really know that this is the case, but we do have a utility in QIIME 2 that mucks about with file-descriptors. It occurs to me we could try using that to see if "poking it with a stick" helps.

The context manager: qiime2.util.redirected_stdio will rewrite the file descriptors for 1/2 in the process table to point at different file descriptors, we could try using this to see if there's an issue with permissions as described in the above answer.

Try something like this:

# these can be context managers also, but I didn't feel like indenting
new_stderr = open('/tmp/test.stderr', 'w')  # or wherever
new_stdout = open('/tmp/test.stdout', 'w')

with qiime2.util.redirected_stdio(new_stdout, new_stderr):
    phyl = phylogeny.pipelines.align_to_tree_mafft_fasttree(
               sequences=denoised.representative_sequences)

This is more or less what q2cli does to manage noisy programs and --verbose/error-log behavior.

Let me know what happens/explodes...

jamesabbott · February 1, 2019, 12:10pm

Hi Evan,

Thanks for the reply. You are right that there is some 'su'-ing going on with qrsh jobs. An execution daemon runs on each node, and this takes care to launching jobs under the correct uid using (I believe...) su.

I've tried using redirected_stdio as suggested, which barfs in a new and exciting way:

---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-6-cd261de1e4d6> in <module>
     12 
---> 13 with redirected_stdio(stderr=new_stderr,stdout=new_stdout):
     14     phyl=phylogeny.pipelines.align_to_tree_mafft_fasttree(sequences=denoised.representative_sequences)

~/miniconda3/envs/qiime2-2019.1/lib/python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

~/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/util.py in redirected_stdio(stdout, stderr)
     22     with _REDIRECTED_STDIO_LOCK:
     23         if stdout is not None:
---> 24             with _redirected_fd(to=stdout, stdio=sys.stdout):
     25                 if stderr is not None:
     26                     with _redirected_fd(to=stderr, stdio=sys.stderr):

~/miniconda3/envs/qiime2-2019.1/lib/python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

~/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/util.py in _redirected_fd(to, stdio)
     41         stdio = sys.stdout
     42 
---> 43     stdio_fd = _get_fileno(stdio)
     44     # copy stdio_fd before it is overwritten
     45     # NOTE: `copied` is inheritable on Windows when duplicating a standard

~/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/util.py in _get_fileno(file_or_fd)
     62 
     63 def _get_fileno(file_or_fd):
---> 64     fd = getattr(file_or_fd, 'fileno', lambda: file_or_fd)()
     65     if not isinstance(fd, int):
     66         raise ValueError("Expected a file (`.fileno()`) or a file descriptor")

UnsupportedOperation: fileno

Again, this works when run in a standalone python script, but not from within a notebook. As an additional comparison I've tried this on my mac, so gridengine will not be getting in the way, and it works fine in a notebook. Not a great comparison since it's a different os however...

I've also now found a similar issue which has already been reported, but I somehow missed when searching yesterday: redirected_stdio does not work within a Jupyter Notebook · Issue #219 · qiime2/qiime2 · GitHub

The recommendation of using a magic %%capture command in this issue also isn't working in my case.I'm beginning to think that trying to run notebooks on our cluster just adds too many layers of redirection and we may need to rethink our environment a bit.

Many thanks
James

ebolyen · February 11, 2019, 5:44pm

Hi @jamesabbott,

Thanks for the updates, and sorry for the late reply.

I had not seen that issue either. It makes sense why our context manager isn't working here now. If you are still ok with experimenting, you could try this instead (which will rewrite the process table outright instead of trying to be nice, which isn't working for Jupyter).

Note: this crashes IPython, but it doesn't seem to crash the Jupyter notebook... It doesn't look like it impacts things like print(), which should be happening... But the call to mafft will be a subprocess call which inherits parts of the parent process table such as the file-descriptors, so it should still do what we need I think.

import os

new_stderr = open('/tmp/test.stderr', 'w')
new_stdout = open('/tmp/test.stdout', 'w')

os.dup2(new_stdout.fileno(), 1)
os.dup2(new_stderr.fileno(), 2)
# The process table should see these files as the new stdout/err
# Jupyter bypasses this for some reason, so it doesn't seem to affect it
run.your_method(here)

That may be the case right now. That said, I would like it if QIIME 2 could handle these cases gracefully, so if you where able to provide details on how to set up a similar environment, I would be happy to make an issue and see if we can't improve the story here. (It's also ok if that's too much work to do, setting up these environments can suck if you don't have automated tooling describing the process.)

jamesabbott · February 13, 2019, 12:29pm

Hi Evan, and thanks for the suggestion.

The good news is that it has done the trick, and allows mafft to execute successfully.

I think replicating the environment would be considered non-trivial. Basically, the cluster is managed using Univa Grid Engine, a commercial version of what was once Sun Grid Engine, which is forked as the open source Open Grid Scheduler: http://gridscheduler.sourceforge.net. In order to get a similar environment, a functioning Grid Engine/Grid Scheduler would need to be installed and configured such that interactive jobs can be submitted to queues using 'qrsh' (which essentially gets a shell on a cluster node but under the management of the scheduler).

To run a jupyter notebook under this environment, after activate my qiime2 conda environment, I start jupyter on a cluster node using:

qrsh -cwd -V -N notebook jupyter notebook --ip $(hostname --fqdn) --no-browser

This also requires that the cluster nodes are routable from your network in order to access the notebook.

I think we will probably look at moving to a centralised jupyterhub installation to bypass this somewhat convoluted approach, which I suspect is not a common use case.

Many thanks
James

ebolyen · February 13, 2019, 5:33pm

Hi @jamesabbott,

That is great to hear! So my interpretation then is it is an issue with qrsh and mafft.

jamesabbott:

I think replicating the environment would be considered non-trivial. Basically, the cluster is managed using Univa Grid Engine, a commercial version of what was once Sun Grid Engine, which is forked as the open source Open Grid Scheduler: http://gridscheduler.sourceforge.net. In order to get a similar environment, a functioning Grid Engine/Grid Scheduler would need to be installed and configured such that interactive jobs can be submitted to queues using ‘qrsh’ (which essentially gets a shell on a cluster node but under the management of the scheduler).

To run a jupyter notebook under this environment, after activate my qiime2 conda environment, I start jupyter on a cluster node using:

qrsh -cwd -V -N notebook jupyter notebook --ip $(hostname --fqdn) --no-browser

This also requires that the cluster nodes are routable from your network in order to access the notebook.

That is indeed involved.

As for what QIIME 2 could do to aliviate this issue:

Easiest option is to make qiime2.util.redirected_stdio aware of Jupyter's stderr/stdout files and ignore them appropriately. This should be easy to test since that is just a Jupyter situation. Then when something like this occurs, you can just wrap it in the above context manager and move on. It's not great, but it would solve the problem without permanently changing the process table.
Eventual option: we could plan on rewriting the process table behind the scenes as a matter of course whenever an action is run, but this seems a little risky and we would have to think carefully about it to make sure we don't break something else.

I think that's a really good idea. JupyterHub is very nice, and creating custom spawners isn't too difficult in the event you need to customize the kernel initialization for some reason.