Database plugin(s)?

Hi all,
Great to see QIIME2 shaping up! I wanted to spark some discussion or ask about a couple of things related to database management for upcoming plugins.

Background: Reference databases (nucleotides, most frequently) are necessary for a lot of reference-based analyses, including open/closed-ref OTU picking and even some forms of taxonomy assignment. These databases may also include auxiliary components like phylogenetic trees, multiple sequence alignments, and so forth.

Question(s): How would distributing reference databases work within the plugin system? Would it be good to make a version-control-like “database manager” plugin that works with a bunch of mainstream tools?

NINJA-OPS, for instance, doesn’t depend on a particular database, but it has conventionally shipped outside of QIIME thus far with the GreenGenes 13.8 reference database. Users seeking other databases might wander onto the NINJA-OPS web site and discover UNITE, SILVA, etc and try to get them set up with their standalone NINJA-OPS installation. But plugin distribution is a more elegant way to install packages like this, taking the burden of installation juggling out of users’ hands – and users are not expected to know precisely where and how the plugin framework management is taking care of these things under the hood. It would make sense for there to be a database plugin (either specific to NINJA-OPS or aware of multiple plugins), along with a (centralized?) location to house these often massive databases for automatic installation.

What would the best way to accomplish this be under the current system? Or are there/should there be better alternatives (better than hijacking the plugin system)?

Cheerio,
Gabe

3 Likes

Hey @gabe, thanks for the discussion questions!

I think what probably makes the most sense is to just distribute the artifacts (.qza files) directly. They will contain the provenance used to generate them, so they are pretty self-contained. I don’t think we had really imagined plugins ever being responsible for “data management” as that is kind of the framework and artifact’s business.

q2-feature-classifier is in basically the same position, but it doesn’t attempt to distribute the databases itself. Instead the user imports the data or they can download a pre-trained classifier.

Personally, I think a public repository of .qza files with searchable metadata could be a pretty neat general solution to this problem in the future, but that’s pretty far out (and who pays for it?).

1 Like