I wanted to ask a general question about the Docker image setup that is currently used for QIIME2, as we've seen issues with this on our local HPC when running this under Singularity, and may affect portability of workflows if we want to use this container within a standard workflow system (Nextflow, Cromwell/WDL, etc.) on our cluster or elsewhere.
In short, it appears QIIME2 is installed in the container under /home/qiime2, but this appears to conflict with the file system design for used that we have on our cluster, which is pretty common for HPC. More specifically, user- or group-specific directories are all under a /home root , for example my user space is under /home/a-m/cjfields and our lab space is under /home/groups/hpcbio.
Singularity on many systems will automatically bind the host file system, so when it tries to access the pre-generated cache under the container's /home/qiime2 space it is actually trying to access the host file system /home/qiime2 (and gets a permission denied). We can work around this in the short term to some extent by preventing the initial host file system binding, and then remapping the file system over to another location in the container, described very shortly here:
This fixes our specific case and allows the cache to be accessed and downstream tools (feature-classifier for example), but obviously causes issues in the longer term as it also requires remapping the local filesystem file paths used as input (which we use a wrapper script for), and doesn't solve the problem should others utilize HPC resources that have a similar file system structure and that don't allow Docker but utilize singularity/apptainer.
One proposed solution our IT group suggested would be to have QIIME2 and the cache installed under another base-level directory that isn't commonly used for user-specific locations, for example /opt. But I am open to any suggestions.
QIIME 2 recently overloaded the term cache. There are two caches in QIIME 2 now, the more recently introduced artifact cache, and the cli cache that has been around for as long as QIIME 2. The cache causing issues here is the cli cache which confuses me because you said this was working on older versions of QIIME 2, but the relevant code hasn't been touched in 8 years!
Unfortunately, there is no easy way to directly specify where the cli cache is created. It probably managed to not cause issues for the last 8 years so was never a priority. If you are using a conda environment, it gets put in the var directory in the conda environment. Otherwise, we use click.get_app_dir to get the directory the cache will be put in (click is the command line library we use for q2cli), and it is this click method that is getting you the home directory.
So, a couple of things. First can you please tell me the last version of QIIME 2 this process worked for? I'd like to try to determine what changed because as far as I can tell neither our code nor the behavior of click.get_app_dir has changed for some time. Additionally, the code that controls where the cache ends up is here.
We can look into adding a way to directly configure where the cli cache is created, but I have no timeline for that. It never came up previously. Right now the best (albeit hacky) solution to control where this cache is put is probably going to be manually setting the CONDA_PREFIX envvar. You can try setting that to get the cache created in the /opt directory or something similar.
I should clarify this based on checking back at the prior analyses. When I say it "worked" in older Docker versions: it did, but only for simple import/export of data as mentioned before in the forum link above. I went back and retested the same steps using older Docker images converted to Singularity but trying to use q2-feature-classifier, and ran into the same permissions issues as before.
My point with the above is that we would likely run into problems when incorporating QIIME2 into a standard workflow (e.g., WDL, Snakemake, Nextflow, etc) on HPC that use the base /home directory for users, a fairly common practice.