Tourmaline: a workflow for rapid and reproducible amplicon sequence analysis using QIIME 2 and Snakemake

Luke_Thompson · September 21, 2021, 2:41pm

Announcing Tourmaline, a fully-featured Snakemake workflow for QIIME 2.

Features include:

Portability. Native support for Linux and macOS in addition to Docker containers.
QIIME 2. The core commands of Tourmaline, including the DADA2 and Deblur packages, are all commands of QIIME 2, one of the most popular amplicon sequence analysis software tools available. You can print all of the QIIME 2 and other shell commands of your workflow before or while running the workflow.
Snakemake. Managing the workflow with Snakemake provides several benefits:
- Configuration file contains all parameters in one file, so you can see what your workflow is doing and make changes for a subsequent run.
- Directory structure is the same for every Tourmaline run, so you always know where your outputs are.
- On-demand commands mean that only the commands required for output files not yet generated are run, saving time and computation when re-running part of a workflow.
Parameter optimization. The configuration file and standard directory structure make it simple to test and compare different parameter sets to optimize your workflow. Included code helps choose read truncation parameters and identify outliers in representative sequences (ASVs).
Visualizations and reports. Every Tourmaline run produces an HTML report containing a summary of your metadata and outputs, with links to web-viewable QIIME 2 visualization files.
Downstream analysis. Analyze the output of single or multiple Tourmaline runs programmatically, with qiime2R in R or the QIIME 2 Artifact API in Python, using the provided R and Python notebooks or your own code.

QIIME 2 options supported:

FASTQ sequence import using a manifest file, or use your pre-imported FASTQ .qza file
Denoising with DADA2 (paired-end and single-end) and Deblur (single-end)
Feature classification (taxonomic assignment) with options of naive Bayes, consensus BLAST, and consensus VSEARCH
Feature filtering by taxonomy, sequence length, feature ID, and abundance/prevalence
De novo multiple sequence alignment with MUSCLE, Clustal Omega, or MAFFT (with masking) and tree building with FastTree
Outlier detection with odseq
Interactive taxonomy barplot
Tree visualization using Empress
Alpha diversity, alpha rarefaction, and alpha group significance with four metrics: Faith's phylogenetic diversity, observed features, Shannon diversity, and Pielou’s evenness
Beta diversity distances, principal coordinates, Emperor plots, and beta group significance (one metadata column) with four metrics: unweighted and weighted UniFrac, Jaccard distance, and Bray–Curtis distance

Links:

Please let me know if you have any feedback on the workflow or the preprint!

Thanks!
Luke