Automating qiime2 commands with bash script

Sam_Degregori · February 11, 2022, 8:19pm

Hi all,

I find myself constantly filtering different datasets and then running the same commands on them where first I filter by metadata, then I run the core metrics command on that subset, and then a couple of beta and alpha diversity tests on all the distance matrix outputs for that subset.

I feel like someone has had to have written a script to automate this? Anyone know of any posts touching on this? Could not find much.

Edit: just to be more clear something like code below:

qiime feature-table filter-samples
--i-table cleantable.qza
--m-metadata-file compiled_island_metadata.txt
--p-where "[Column]='$var1'"
--o-filtered-table [$var1]table.qza

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-treei3.qza
--i-table [$var1]table.qza
--p-sampling-depth 1000
--m-metadata-file compiled_island_metadata.txt
--output-dir core-metrics-results-barber-[$var1]

and so on. where var1 is basically the treatment and I want to use it as a naming mechanism for all downstream processes. Not sure how to insert var1 into the bash script so I just used brackets as a guess.

Thank you!
Cheers,
Sam

colinbrislawn · February 11, 2022, 11:58pm

Hello Sam,

Yep, some have been built. These two use Snakemake:

You could also capture your common commands in a code notebook like Jupyter, then simply make a new copy of your notebook for each project, then modify and rerun as needed.

Sam_Degregori · February 12, 2022, 12:13am

Hi Colin,

Thanks for sharing. Snakemake sounds very useful. Will take a look.

Also I messed around on bash and made this script so I don't have to go back and change the metadata columns etc. So now I can just run :

bash auto.sh column var1

for a given treatment where auto.sh is something like:

#!/bin/bash
#automation script

column=$1
var1=$2
table=table.qza

qiime feature-table filter-samples
--i-table cleantable.qza
--m-metadata-file compiled_island_metadata.txt
--p-where "[$column]='$var1'"
--o-filtered-table $var1$table

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-treei3.qza
--i-table $var1$table
--p-sampling-depth 1000
--m-metadata-file compiled_island_metadata.txt
--output-dir core-metrics-results-barber-$var1

But you can keep going with all the analyses

colinbrislawn · February 12, 2022, 7:44pm

Nice!

If you want to get started using Snakemake, I highly recommend it. It has a wonderful learning curve, making it easy to automate easy tasks, then slowly adds complexity to automake complex tasks.

There's the excellent snakemake tutorial, and this 5-step Qiime2 pipeline I made that trains a sklearn classifier

EDIT: One of the best parts about Snakemake + Qiime2 is that you can build fully automated pipelines AND you still have the detailed provenance baked into each Qiime2 artifact produced. So even as you distribute results or proceed with downstream analysis outside of your Snakemake pipeline, the full provenance is preserved.

Sam_Degregori · February 14, 2022, 5:12pm

Yea this would help me expand this to my entire pipeline for big datasets. Thank you!

DaS · February 16, 2022, 9:22am

To add an alternative implementation: I maintain a nextflow pipeline for amplicon sequencing analysis: ampliseq: Introduction & GitHub - nf-core/ampliseq: Amplicon sequencing analysis workflow using DADA2 and QIIME2. I made a more detailed post here.

Sam_Degregori · February 17, 2022, 7:53pm

Awesome this is great. THank you!