Artifact Cache Basic Instructions

This tutorial has been replaced with related content in Using QIIME 2, which can be found here.

Original content of this post

This is a preliminary set of instructions and thus far only covers the basics.

Basic Overview:

The QIIME 2 artifact cache allows QIIME 2 users to have finer control over where and how QIIME 2 stores data on disk.

The artifact cache serves two primary purposes:

  1. Controlling where QIIME 2 stores its working files
  2. Avoiding zipping and unzipping QIIME 2 artifacts every time you use them

Users can create and interact with artifact caches via both the Python API and the CLI. This tutorial will provide instructions for both APIs.

An artifact cache is created at a given point on your file system. After an artifact cache is created, it can be used to store artifacts as unzipped directories not as .qza files. Artifacts in a cache are referred to by a combination of the path to the cache and a user created key; not by their path on the filesystem

Note: users should NOT interact with the cache directory directly. The directory should only be modified via the provided QIIME 2 APIs with the sole exception of deleting the cache which can be done by just removing the directory.

Consider a use case where you have a very large artifact, say an 80 gigabyte database, and you are using this database as an input to QIIME 2 actions on an HPC. It would be ideal to avoid constantly zipping and unzipping this large database into and out of a .qza. It would also be ideal to avoid moving the artifact around the HPCs filesystem to make sure the workers running the QIIME 2 action have access to the file.

These issues can be resolved by putting your artifact in a cache on a part of the HPC's file system that is globally accessible by the workers. The artifact will be stored in unzipped format, so the action will not need to unzip the artifact before using it, and the artifact will be located somewhere the workers already have access to, so it will not need to be moved around the filesystem before the action can execute.

Note: The CLI instructions outside of "Creating a cache" assume the cache you are referencing already exists. The Python instructions assume you have created a Cache object with the name cache already.

Creating a cache:

CLI

qiime tools cache-create <path>

Python API

from qiime2 import Cache

cache = Cache(<path>)

NOTE: This will create a cache at the given path if one does not exist, and it is also how you get an object referring to an existing cache if you give it a path to an existing cache.

Storing an artifact in a cache:

This will store an artifact in the specified cache with a specified key.

CLI

qiime tools cache-store --cache <cache-path> --artifact-path <artifact-path> --key <key>

Python API

cache.save(<artifact>, <key>)

Removing an artifact from a cache:

You remove an artifact from the cache by specifying a key to remove from the cache. This will remove the artifact that corresponds to the given key.

CLI

qiime tools cache-remove --cache <cache-path> --key <key-to-remove>

Python API

cache.remove(<key>)

Using a cached artifact as input:

CLI

qiime <plugin> <action> --i-input <cache-path>:<key>

Python API

art = cache.load(<key>)
# Then just pass art into your action the same as any other artifact

Storing an output in a cache:

CLI

qiime <plugin> <action> --o-output <cache-path>:<key>

Python API

# Store the artifact in the cache same as previously
cache.save(<artifact>, <key>)
7 Likes