Biocontainer for qiime2?

Emelie · February 24, 2021, 8:23pm

HI there! I have seen that qiime exists as a biocontainer, but that qiime2 is not available. I wonder if there's any plans of making qiime2 available as a biocontainer, or if there's something that speaks against having qiime2 as a biocontainer?
Cheers, Emelie.

thermokarst · February 24, 2021, 8:26pm

Hi @Emelie! I think the old QIIME 1 biocontainer is maintained by Gavin Huttley, not sure if he has plans to set up a QIIME 2 biocontainer. I'm not super familiar with the biocontainer project, but their docs mention:

BioContainers has been built around three main technologies: Conda , Docker and Singularity . The BioContainers Community releases for every bioinformatics software containers in these three technologies or flavours.

We already distribute QIIME 2 as conda packages and docker containers (although no singularity containers at the moment) - do any of those options get you what you need?

:qiime2:

Emelie · February 24, 2021, 8:43pm

Thank you for your very rapid reply! It is definitely convenient that there is a conda package and a docker container, but for the project I'm working with it would be ideal to have qiime2 as a biocontainer as well. But our use might be very specific. We are using nextflow to write a pipeline that uses qiime (yes, a pipeline with a pipeline in it), and the next step of this pipeline is to write a module for qiime2, that would ideally use a biocontainer. But if there's no plans of creating a biocontainer, I'll look into a workaround. Thanks again!

thermokarst · February 24, 2021, 11:53pm

Have you seen this project, @Emelie?

We aren't associated with this effort, but it looks like they are using plain conda packages to work with QIIME 2 in nextflow, this might give you something to integrate into your efforts. Keep us posted!

:qiime2:

Emelie · February 25, 2021, 2:09pm

Hi! I was probably not explicit enough - that's actually the pipeline I'm working with. It is now migrating from the way it is written to the new modular approach in nextflow, which then is connected to the link I posted above - about using biocontainers in modules. Maybe I could ask a related thing, is there any reason why qiime2 doesn't exist as a bioconda package? Or has it to do with the fact that qiime2 already is easy to install, and therefore having it on bioconda has been unnecessary?

thermokarst · February 26, 2021, 12:13am

lol! Sorry about that, I had no idea!

I want to put you in touch with @ebolyen, our resident QIIME 2 architect - I think he is familiar with some of the specifics of the ampliseq pipeline and would like to share some ideas with you!

:qiime2:

Emelie · February 26, 2021, 12:30pm

Which is my bad, should probably been explicit about it.

Sounds great, ideas are always valuable!

ebolyen · March 8, 2021, 6:54pm

Hi @Emelie!

Sorry for the delay on my part. I'm pretty excited by NextFlow's DSL 2 and I think there's a lot of opportunity here.

To talk specifically about biocontainers, I think one of the reasons it hasn't happened yet is that we have a pretty extensive mechanism for testing upstream dependencies to make sure everything works together. This is of course perfectly compatible with the idea of a docker container, however it is less compatible with bioconda (and hence biocontainers).

As you are almost certainly aware, we negotiate ~400 dependencies in our "default" distribution, this makes us pretty nervous about end-user conda-resolution. We have had several occasions where our integration tests (which allow conda to resolve as it pleases) fail as a result of dependency shifts upstream of us. We usually find good workarounds, be it updating our own code or pinning an odd dependency upstream until the situation is resolved elsewhere and conda has overall done an absolutely astonishing job of making this possible in general, but the ecosystem isn't perfect and we don't want to shift this burden onto our users. So when conda finds a resolution that passes our tests, we "lock in" those versions with an environment file (you can see the many many generated ones here).

Now I am less familiar with biocontainers so there may be other approaches for creating them than using bioconda's dependency resolution to generate them (which by the way is just super cool in and of itself).

Something else to consider is that as we move into the future, there will not be a "single" QIIME 2 distribution. I may have different plugins installed for metabolomics, or phylogeneomics, and so any solution needs to be adaptable to this problem.

I have a few ideas based on work I am currently doing for automatically generating Galaxy wrappers from QIIME 2 actions in q2galaxy:

Per-plugin environment extraction.
This is where we crawl a validated environment file to identify what dependencies are needed for a given plugin and then we use some mechanism to pin these specifically. I wrote a gist a long time ago to make this happen. This doesn't get around the need for an environment file, but it does reduce the burden on any individual docker container (especially if you aren't using a lot of plugins). I am still uncertain how to make this fit into the biocontainer ecosystem, but making a standard docker container from this is straight-forward
Automatically generate Modules from QIIME 2 actions. I am unsure how open you are to this, but QIIME 2 has been specifically designed to make generating interfaces dynamically from a given installation possible. This is how both our Python API and command line interface work. They observe the plugins installed and query their capabilities in order to represent them in a way natural for that interface.

I believe QIIME 2 does a good job of modeling dataflow oriented processes, and so for the most part, we can make a 1-1 mapping in these systems automatically. This is what I have done with q2galaxy and q2cwl (although the latter needs work at this point). The basic process looks like an interface which renders Galaxy-XML/CWL-yaml/NF-modules and which has a bespoke driver which interprets the arguments from that pipeline into their respective Python versions for the QIIME 2 SDK (this avoids any coupling to more user-facing UI's such as our command line or Python interface which may make decisions which are less compatible with these systems).

Basically we can automatically generate the glue. I think this approach could work very well in NextFlow and I would be excited to potentially assist. I think a really compelling story can be made for creating tools with enough self-description that it is possible to make these translation layers automatically, allowing user's to work in whatever environment is most applicable for them.

I realize that was a lot, and it only barely addresses the biocontainer situation, but I think there’s some really powerful things that could be done with DSL 2 modules and our SDK.

Emelie · March 11, 2021, 7:05am

Hi @ebolyen!

Thank you for a very extensive reply! I realise that you have given this quite some thought, and I realise that there a lot of things to consider. For the ampliseq-pipeline that I'm working on, we decided to move forward with the docker-container (and thereby not providing conda-support for the qiime-related processes), but that seems to be the most reasonable approach at the moment.

Thank you for your time, and also thanks to thermokarst!

ebolyen · March 11, 2021, 8:35pm

That sounds like a good path forward! Let us know when there is a container we can check out, I imagine some people will find it useful

angel · March 12, 2021, 5:26am

great, helpful for me. thanks