Can't upload files to galaxy that's being run with docker

tashoo · August 8, 2022, 11:42pm

I have an M1 Macbook on which I'm trying to run q2 inside galaxy. I followed the instructions found in the "Running QIIME 2 inside Galaxy (alpha release version)" video that's posted on the Qiime2 youtube channel. To summarize, I installed Docker, pulled the q2galaxy image, created a q2galaxy directory for the exported data, and then I started a container changing the port to 8080 and added the export path. I'm able to open up galaxy in my web browser and I can see all the tools but I'm not able to import any files. The files show up in the history side bar but immediately turn red with an error that says:

tool error

An error occurred with this dataset:

This job failed for reasons that could not be determined

I've attached the docker log.
docker_log.txt (36.8 KB)

ebolyen · August 9, 2022, 9:41pm

Hi @tashoo,

I think that is going to be the issue. I initially poured through the docker log and annotated some interesting things which I'll leave below, for anyone who might run into similar things.

Looking into it a bit more, it looks like Docker knows about Rosetta already, but to indicate that a given container needs the emulation layer you have to pass the --platform linux/amd64 flag. If you are using docker-desktop there may be an advanced setting somewhere for that, but I'm not confident. You may need to run the image on the command line to test if the above is going to be viable.

Hopefully that gives you a direction to start poking, but the main issue here is the processor is just a different architecture than the image expects, so it's kind of surprising it worked as well as it did. Perhaps Rosetta is already running under the hood but there's still some incompatibility lurking? I can't really say for sure.

ORIGINAL REPLY:

Thank you for the docker log, that is really helpful. I'm going to pull out a few pieces that seem interesting, though I don't yet know how significant they are to your problem.

==> /home/galaxy/logs/slurmd.log <==\
[2022-08-06T00:41:30.214] slurmd version 17.11.2 started\
[2022-08-06T00:41:30.217] slurmd started on Sat, 06 Aug 2022 00:41:30 +0000\
[2022-08-06T00:41:30.221] CPUs=4 Boards=1 Sockets=1 Cores=4 Threads=1 Memory=7851 TmpDisk=59819 Uptime=1007 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)\
[2022-08-08T23:24:13.098] error: Domain socket directory /var/spool/slurmd: No such file or directory\
[2022-08-08T23:24:13.116] Node reconfigured socket/core boundaries SocketsPerBoard=4:1(hw) CoresPerSocket=1:4(hw)\
[2022-08-08T23:24:13.117] Message aggregation disabled\
[2022-08-08T23:24:13.122] CPU frequency setting not configured for this node\
[2022-08-08T23:24:13.129] slurmd version 17.11.2 started\
[2022-08-08T23:24:13.134] slurmd started on Mon, 08 Aug 2022 23:24:13 +0000\
[2022-08-08T23:24:13.138] CPUs=4 Boards=1 Sockets=1 Cores=4 Threads=1 Memory=7851 TmpDisk=59819 Uptime=1124 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)\

Some basic info from SLURM (the job scheduler). This all seems perfectly fine. The TmpDisk might be low, but it would depend on what the units are on that number and I'm not familiar enough with the configuration to suggest a change there anyway.

After that initial setup, we see a string of these:

==> /home/galaxy/logs/uwsgi.log <==\
Mon Aug  8 23:24:13 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:14 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:15 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:16 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:17 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:18 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:19 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:20 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:21 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:22 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:23 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\
Mon Aug  8 23:24:24 2022 - *** uWSGI listen queue of socket "127.0.0.1:4001" (fd: 6) full !!! (28770288/64) ***\

Which definitely seems wrong and might be the source of the issue, although I don't have a solution yet. I'm also not quite certain why the incoming end of the uWSGI socket would end up full, but it could explain why the job failed if uWSGI is potentially dropping information that the worker(s) needed.

==> /home/galaxy/logs/handler1.log <==\
galaxy.jobs DEBUG 2022-08-08 23:24:42,690 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] Job wrapper for Job [101] prepared (439.562 ms)\
galaxy.jobs.command_factory INFO 2022-08-08 23:24:42,751 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] Built script [/export/galaxy-central/database/job_working_directory/000/101/tool_script.sh] for tool command [python '/galaxy-central/tools/data_source/upload.py' '/galaxy-central' '/export/galaxy-central/database/job_working_directory/000/101/registry.xml' '/export/galaxy-central/database/job_working_directory/000/101/upload_params.json' '101:/export/galaxy-central/database/job_working_directory/000/101/working/dataset_101_files:/export/galaxy-central/database/files/000/dataset_101.dat']\
galaxy.tool_util.deps DEBUG 2022-08-08 23:24:42,876 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] Using dependency bcftools version 1.5 of type conda\
galaxy.jobs.runners DEBUG 2022-08-08 23:24:42,884 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] (101) command is: mkdir -p working outputs configs\
if [ -d _working ]; then\
    rm -rf working/ outputs/ configs/; cp -R _working working; cp -R _outputs outputs; cp -R _configs configs\
else\
    cp -R working _working; cp -R outputs _outputs; cp -R configs _configs\
fi\
cd working; /bin/bash /export/galaxy-central/database/job_working_directory/000/101/tool_script.sh > ../outputs/tool_stdout 2> ../outputs/tool_stderr; return_code=$?; cd '/export/galaxy-central/database/job_working_directory/000/101'; \
[ "$GALAXY_VIRTUAL_ENV" = "None" ] && GALAXY_VIRTUAL_ENV="$_GALAXY_VIRTUAL_ENV"; _galaxy_setup_environment True\
export PATH=$PATH:'/export/tool_deps/_conda/envs/__bcftools@1.5/bin' ; python "metadata/set.py"; sh -c "exit $return_code"\
galaxy.jobs.runners.drmaa DEBUG 2022-08-08 23:24:42,966 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] (101) submitting file /export/galaxy-central/database/job_working_directory/000/101/galaxy_101.sh\
galaxy.jobs.runners.drmaa DEBUG 2022-08-08 23:24:42,966 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] (101) native specification is: --ntasks=1 --share\
galaxy.jobs.runners.drmaa INFO 2022-08-08 23:24:43,017 [pN:handler1,p:255,tN:SlurmRunner.work_thread-0] (101) queued as 2\
galaxy.jobs.runners.drmaa DEBUG 2022-08-08 23:24:43,190 [pN:handler1,p:255,tN:SlurmRunner.monitor_thread] (101/2) state change: job is queued and active\
\

Here we see the job has been setup and submitted, since there's a bunch of scripts happening in particular we see [python '/galaxy-central/tools/data_source/upload.py' ... which is the upload tool.

==> /home/galaxy/logs/handler1.log <==\
galaxy.jobs.runners.drmaa DEBUG 2022-08-08 23:24:44,226 [pN:handler1,p:255,tN:SlurmRunner.monitor_thread] (101/2) state change: job finished, but failed\
galaxy.jobs.runners.slurm WARNING 2022-08-08 23:24:44,366 [pN:handler1,p:255,tN:SlurmRunner.monitor_thread] (101/2) Job failed due to unknown reasons, job state in SLURM was: FAILED\

This one is just SLURM noticing that the job failed (through the DRMAA library), not a real explanation, but it probably wasn't SLURM's fault, since it's the one relaying the message.

After that we see some cleanup fail as presumably these directories were not created as expected:

==> /home/galaxy/logs/handler1.log <==\
galaxy.tools.error_reports DEBUG 2022-08-08 23:24:44,760 [pN:handler1,p:255,tN:SlurmRunner.work_thread-1] Bug report plugin <galaxy.tools.error_reports.plugins.sentry.SentryPlugin object at 0x406b57b550> generated response None\
galaxy.jobs.runners DEBUG 2022-08-08 23:24:44,805 [pN:handler1,p:255,tN:SlurmRunner.work_thread-1] (101/2) Unable to cleanup /export/galaxy-central/database/job_working_directory/000/101/galaxy_101.sh: [Errno 2] No such file or directory: '/export/galaxy-central/database/job_working_directory/000/101/galaxy_101.sh'\
galaxy.jobs.runners DEBUG 2022-08-08 23:24:44,819 [pN:handler1,p:255,tN:SlurmRunner.work_thread-1] (101/2) Unable to cleanup /export/galaxy-central/database/job_working_directory/000/101/galaxy_101.o: [Errno 2] No such file or directory: '/export/galaxy-central/database/job_working_directory/000/101/galaxy_101.o'\
galaxy.jobs.runners DEBUG 2022-08-08 23:24:44,836 [pN:handler1,p:255,tN:SlurmRunner.work_thread-1] (101/2) Unable to cleanup /export/galaxy-central/database/job_working_directory/000/101/galaxy_101.e: [Errno 2] No such file or directory: '/export/galaxy-central/database/job_working_directory/000/101/galaxy_101.e'\
galaxy.jobs.runners DEBUG 2022-08-08 23:24:44,852 [pN:handler1,p:255,tN:SlurmRunner.work_thread-1] (101/2) Unable to cleanup /export/galaxy-central/database/job_working_directory/000/101/galaxy_101.ec: [Errno 2] No such file or directory: '/export/galaxy-central/database/job_working_directory/000/101/galaxy_101.ec'\

tashoo · August 13, 2022, 7:22pm

Hi Evan,

Thank you for looking into that. I tried running it through the command line using the --platform linux/amd64 flag but had the same problem.

I was able to get galaxy running using the planemo method you have in your tutorial but it takes a while to start up. Do you know if I have to run the planemo serve --install_galaxy command each time I want to use galaxy?

ebolyen · August 16, 2022, 5:08pm

Hey @tashoo!

Good sleuthing, I can't really say why the platform flag didn't fix it, but it sounds like planemo works. This also suggests installing a local galaxy will work (but it is slightly more involved).

For planemo (which is definitely super easy to use) you can also point planemo at a local galaxy repo (via --galaxy_root).
https://planemo.readthedocs.io/en/latest/commands/serve.html

You should be able to use the same source branch, you would just check it out locally via git. (You've clearly done your homework so far, so I'm presuming git isn't a big deal, but let me know if you need instructions for that, happy to help.)

tashoo · August 19, 2022, 4:46pm

Hi @ebolyen,

I'm medium familiar with all of this so it'd be nice to get your input. I cloned the galaxy repository you point to in the planemo command found here and switched to the galaxy branch. Then I activate my qiime2 environment but I'm not totally sure about the planemo command I need. Would this work:

planemo serve --galaxy_root Documents/galaxy \                        
  --no_conda_auto_install \
  --no_conda_auto_init \
  --no_cleanup \
  --file_path Documents/q2galaxy_data \
  $HOME/Documents/q2galaxy_data

Thanks for your help!

ebolyen · August 31, 2022, 4:34pm

Hey @tashoo,

Sorry for taking so long to respond (was busy with the release). Thanks to the most recent release, there's now a much easier way!

Re: planemo I think that command should work, although I don't recall if I was ever able to get the data to permanently stick around, hence the emphasis on docker with a specific volume. However the (now) easier way to do this is as follows

Set up a docker-aware galaxy instance

Since you've already cloned Galaxy, you can actually just modify a config file to enable docker execution:

In the root of your galaxy repository, you should see config/ as a directory, inside that create a file named: job_config.xml

with these contents (or similar):

<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is
     configured by default (if there is no explicit config). -->
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
    </plugins>
    <destinations>
	    <destination id="local" runner="local">
	    	<param id="docker_enabled">true</param>
	    </destination>
    </destinations>
</job_conf>

(This important part is that your destination includes <param id="docker_enabled">true</param>)

Then, all you need to do is use the ./run.sh script (unfortunately with sudo), so:

sudo ./run.sh

in the root of the Galaxy repository. (This will take a while as a bajillion things build for the first time, subsequent runs will be faster.)

Once thats done and you see things about uvicorn and celery-beat.log you probably have a working server and can do the next step

Caveats:

Running things with sudo is dangerous because it gives whatever program you are running total control over the system. Galaxy is an awfully nice program, but there may be unknown bugs which someone could exploit. Making sure you have decent firewall rules (which is usually the default case nowadays) can be an important measure here.
The data you generate will also be owned by your root user, so your normal account won't be able to do very much with the raw data. Fortunately, you usually don't care about this as you can always download the results from the web UI at which point everything will have normal permissions and you won't notice anything odd.
You can also set up docker without root access and describe a user with permissions to use it, this is the ultimately correct way to handle it, but it can be awfully involved.

Add yourself as admin and install qiime2 through the toolshed

At this point go ahead and kill the server and we're going to edit a config to add you to the admin list:

config/galaxy.yml.sample (You may wish to copy this to config/galaxy.yml so you can keep the sample as a backup.)

Search for this section (it should be around line 1729):

  # Administrative users - set this to a comma-separated list of valid
  # Galaxy users (email addresses).  These users will have access to the
  # Admin section of the server, and will have access to create users,
  # groups, roles, libraries, and more.  For more information, see:
  # https://galaxyproject.org/admin/
  admin_users: <put your email here>

Then start your server like before and register for an account. The account only lives on your computer, so you don't need to worry too much, this is just how Galaxy will know you intend to be an admin.

After that, you will see an admin section in the top navigation. Clicking it will take you to a control center where you will be able to install tools (it will be on the left sidebar). From there, search for suite_qiime2_core and click install. This step will also take forever and may not look like anythings happening for 4ish minutes. If after that you still see nothing, try clicking install again and wait a bit. Your galaxy log may also be informative.

Once that's complete, you will have installed all of QIIME 2 into Galaxy, and will be well equipped to add other tools you might find useful.

system · October 1, 2022, 10:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.