headless/HPC use of QIIME

Cistron · January 16, 2018, 3:44pm

Hello everybody,

First off: Thanks for developing QIIME!

I have a few questions and points regarding headless use of QIIME.

For the last two years I've been using QIIME1 and I have only dabbled superficially with QIIME2 a couple of months ago. While the new features are great, one thing that has put me off, was the move from log-files to interactive outputs.

I understand this move should enable less-computing-literate scientists to analyse data. However, I found, though this might just be my very personal case at my university, that it is easier for me to get access to HPC, than setup QIIME on a VM or server where I can run it interactively in a notebook. Installation via conda commands also seemed to be quite straightforward.

Yet, on HPC this high level of interactivity makes data processing less convenient. I have to copy outputs and have to have another local installation of QIIME to inspect the output.

Is there a plan to make a more HPC/headless-friendly version of QIIME2?

Originally I've asked a question about running QIIME1 headlessly:

is there a flag that could be set during QIIME installation to configure it for headless use? Or are there flags to run certain scripts/programmes headlessly?

I've now come across a Thermokrast's post here, which sheds a little light as to why matplotlib is configured the way it is.

matplotlib default config varies from installation to installation, it depends on what libraries are available at install time (e.g. qt). Some environments (such as an academic HPC) might have their own matplotlib configs, so we generally steer clear from forcing all of our users to use our config for third party tools. The example I linked to above recommends using Agg as the backend, but there might be other users out there who prefer to use a different offscreen render (or a custom render even!). Similarly, matplotlib also supports per-project configs, which we also don’t want to interfere with. Anyway, we will support some form of general QIIME 2 config in the future — perhaps it is worth continuing this discussion then, to see if there is some way for us to provide a fallback config for tools like matplotlib. Thanks!

Would be possible to set the DISPLAY configuration with flags during installation or for individual scripts?

Cheers,
Michael

thermokarst · January 16, 2018, 4:04pm

Thanks!

Please correct me if I am misunderstanding, but those are two very different concepts, right? Log files tell you something about the computational process (ancillary data), while the (interactive) outputs tell you something about the results of the computation. To that end, I think QIIME 1 and QIIME 2 are pretty similar in that respect --- QIIME 1 still generated things like PDF, HTML, CSV, etc for output, while also attempting to also output logging-type info to log-files. This experience was often inconsistent (logging was performed in a variety of ways; outputs were rendered in a variety of formats). QIIME 2 attempts to codify these ideas, and make them more consistent and predicable. Logging is always on, and is always logged to stdout/stderr. Interfaces like the Studio might choose to display that log in a new tab, while q2cli will save to a tmp file (and optionally, log to stdout/stderr when using the --verbose flag). Terminal output in QIIME 2 is a Visualization, generally made up of an index.* file (index.html, index.csv, index.pdf, etc.), and by being standardized that way, it allows a variety of interfaces to know how to display these files, without actually knowing anything about the visualization (or output) in the first place. This is a pretty important concept when it comes to being decentralized!

For sure, this process of needing to shuffle files around can be a bit of a pain. We actually set up an NGINX server when we host Q2 workshops, these allow folks to snag their data quickly, and load it up in q2view (see below for more). To that end, we have some long term goals for QIIME 2:

We want to be able to have interfaces connect to remote installations (client/server style). That means you could run QIIME 2 Studio on your local workstation, connected to your institution's HPC cluster. Theoretically Visualization could be shipped over the wire with no real overhead.
Visualization files technically support multiple index files: certain visualizations could be rendered out in a way that allows you to request a non-interactive version, where applicable. For example, the viz from demux summarize could render a CSV/TSV of the 7-number summaries, instead of the interactive HTML/JS viz, when requested. Then you can programmatically parse this and interact with these data.

I do want to point out though, that while you need to copy your data somewhere more accessible, that location does not need a local installation of QIIME 2 in order to view visualizations. We have a server-less viewer called q2view at https://view.qiime2.org, that is capable of displaying your visualizations without a deployment of QIIME 2. This is a static page that does not upload your data anywhere - it is completely local. You can learn more about how this works in this talk:

I hope my discussion above has shown that these questions are orthogonal to the logs/outputs question, unless I misunderstood!

Anyway, if you have any additional questions, feel free to send them our way! We are currently in the process of writing some more detailed developer docs that will attempt to provide some detailed architecture discussion, among other things. Stay tuned, and happy QIIMEing!

natbutter · April 11, 2019, 11:07am

Hello, any updates on this kind of capability? Thanks

We want to be able to have interfaces connect to remote installations (client/server style). That means you could run QIIME 2 Studio on your local workstation, connected to your institution’s HPC cluster. Theoretically Visualization could be shipped over the wire with no real overhead.

thermokarst · April 15, 2019, 12:39am

We have deployed QIIME 2 on many, many HPC environments at this point. We do not yet have a remote interface, as described above.

Cistron · September 27, 2019, 11:49am

If you are allowed to have a reverse SSH-tunnel from the compute node to your login node and if you are allowed to port forward, then you can just run Jupyter on the compute. Visualization (qiime python API) can then display the .qzv files in the notebook.