options for installing qiime2 on a cluster

Hi there,

Our team is using an HPC cluster and are looking to do an install without conda/docker/etc.

Best,
Ben

Hi @Ben_E,

Thanks for reaching out! Can you provide more context on your HPC cluster and the reasoning behind attempting to install QIIME 2 outside of conda or docker?

2 Likes

Hi @lizgehret,

Thanks for the quick reply! Our cluster is running Linux on x86_64 hardware. The main reasons we want to install outside of conda/docker is to avoid conflicting issues with different versions of conda and updating packages. Additionally, we share concerns which were brought up previously by another user, those being:

  1. Conda distributes binaries that are not necessarily compatible with our software infrastructure, i.e. don’t find some libraries because they are not installed in standard locations

  2. Conda installs everything in the user’s home, putting a lot of stress on our filesystem (and slowing down the whole cluster)

We wish to limit the amount of bloat/unnecessary resource overhead we have on the cluster. Having our tools installed natively rather than in an image which would contain redundant binaries is therefore ideal for us. I believe our server architect may also have concerns regarding security but that's just speculation on my part.

Thanks,

Ben

Hi @Ben_E,

Thanks for providing those details!

So to give you more context on what a QIIME 2 install entails, I'd recommend taking a look at one of our resolved environment files (here's the ubuntu environment file for our latest Amplicon release). As you'll notice, the list of required packages is extensive.

You can definitely install all of these packages without using a package manager such as conda, but it will most likely make things much more difficult to manage on your end. While conda itself does have a small list of dependencies, the really nice thing about it is that it will segment the list of 600+ dependencies for QIIME 2 within an isolated environment location. This environment will remain separate from any other environments, and from the rest of your cluster - which allows for different package requirements to exist simultaneously (i.e. different versions of R, Python, etc than what might exist on the cluster).

This is actually not true - you can configure conda to install things at a designated location, and many people do not have conda installed in their home directory.

I'd recommend taking a look at what can be configured through the .condarc file - this is where you can specify where conda will look for packages, how environment locations are handled, etc.

Hope this helps! Cheers :lizard:

2 Likes

Hi @lizgehret,

Thanks for the insight, this helps a ton!

All the best,

Ben

2 Likes

Hey @Ben_E,

@SoilRotifer just shared an example of how their HPC is set up, which you may find useful as a reference:

We are able to set up conda environments on our HPC for our courses, within a shared area with read permissions. For example, I simply tell the students to run the following command (after we get there account setup for conda access):

conda config --append envs_dirs /home/SE/BMIG-6202-MSR

so that they have access to all the conda environments we use for our course. This avoids the repeat installs of of the same code by hundreds of users. They can always run conda init to toggle between the system conda and their personal conda.

Here are our HPC instructor instructions, if anyone is interested. In fact, we can run jupyter lab via our HPC, and are set up to run user specific or system specific versions of any conda environments, the user or instructor can simply enable this by running one or both of the following depending on ther environment:

  • conda install -c anaconda ipykernel
  • conda install -c r r-irkernel

Cheers :lizard:

2 Likes

How do you handle this now? Python 3.11 for everyone, forever? :stuck_out_tongue_winking_eye:

I ask because I've used the Modules system on two HPC systems. They both have conda and Qiime2 wrapped as modules you can load with module load qiime2-20yy.mm

This solves duplicate installs because everyone uses the same module.
There is a secret second advantage: a fixed module managed by HPC prevents users from modifying their env, which is what we recommend anyway!

I have no idea about how they set this up. Maybe like this blog post? I can put you all in touch with them if you want.

2 Likes

LOL this is a great point! I'm not exactly sure what the future plan is; I'm just a student trying to help set up our lab's server.

These are great recommendations though, thanks! I'll be passing this info along.

P.S. We ended up just installing QIIME2 into a conda env.

Happy Holidays,

Ben

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.