Automating qiime2 commands with bash script

Hi all,

I find myself constantly filtering different datasets and then running the same commands on them where first I filter by metadata, then I run the core metrics command on that subset, and then a couple of beta and alpha diversity tests on all the distance matrix outputs for that subset.

I feel like someone has had to have written a script to automate this? Anyone know of any posts touching on this? Could not find much.

Edit: just to be more clear something like code below:

qiime feature-table filter-samples
--i-table cleantable.qza
--m-metadata-file compiled_island_metadata.txt
--p-where "[Column]='$var1'"
--o-filtered-table [$var1]table.qza

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-treei3.qza
--i-table [$var1]table.qza
--p-sampling-depth 1000
--m-metadata-file compiled_island_metadata.txt
--output-dir core-metrics-results-barber-[$var1]

and so on. where var1 is basically the treatment and I want to use it as a naming mechanism for all downstream processes. Not sure how to insert var1 into the bash script so I just used brackets as a guess.

Thank you!
Cheers,
Sam

Hello Sam,

Yep, some have been built. These two use Snakemake: :snake:

You could also capture your common commands in a code notebook like Jupyter, then simply make a new copy of your notebook for each project, then modify and rerun as needed.

2 Likes

Hi Colin,

Thanks for sharing. Snakemake sounds very useful. Will take a look.

Also I messed around on bash and made this script so I don't have to go back and change the metadata columns etc. So now I can just run :

bash auto.sh column var1

for a given treatment where auto.sh is something like:

#!/bin/bash
#automation script

column=$1
var1=$2
table=table.qza

qiime feature-table filter-samples
--i-table cleantable.qza
--m-metadata-file compiled_island_metadata.txt
--p-where "[$column]='$var1'"
--o-filtered-table $var1$table

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-treei3.qza
--i-table $var1$table
--p-sampling-depth 1000
--m-metadata-file compiled_island_metadata.txt
--output-dir core-metrics-results-barber-$var1

But you can keep going with all the analyses

1 Like

Nice! :+1:

If you want to get started using Snakemake, I highly recommend it. It has a wonderful learning curve, making it easy to automate easy tasks, then slowly adds complexity to automake complex tasks.

There's the excellent snakemake tutorial, and this 5-step Qiime2 pipeline I made that trains a sklearn classifier

EDIT: One of the best parts about Snakemake + Qiime2 is that you can build fully automated pipelines AND you still have the detailed provenance baked into each Qiime2 artifact produced. So even as you distribute results or proceed with downstream analysis outside of your Snakemake pipeline, the full provenance is preserved. :memo: :hourglass: :rewind:

1 Like

Yea this would help me expand this to my entire pipeline for big datasets. Thank you!

1 Like

To add an alternative implementation: I maintain a nextflow pipeline for amplicon sequencing analysis: https://nf-co.re/ampliseq & GitHub - nf-core/ampliseq: 16S rRNA amplicon sequencing analysis workflow using QIIME2. I made a more detailed post here.

Awesome this is great. THank you!