I find myself constantly filtering different datasets and then running the same commands on them where first I filter by metadata, then I run the core metrics command on that subset, and then a couple of beta and alpha diversity tests on all the distance matrix outputs for that subset.
I feel like someone has had to have written a script to automate this? Anyone know of any posts touching on this? Could not find much.
Edit: just to be more clear something like code below:
and so on. where var1 is basically the treatment and I want to use it as a naming mechanism for all downstream processes. Not sure how to insert var1 into the bash script so I just used brackets as a guess.
Yep, some have been built. These two use Snakemake:
You could also capture your common commands in a code notebook like Jupyter, then simply make a new copy of your notebook for each project, then modify and rerun as needed.
If you want to get started using Snakemake, I highly recommend it. It has a wonderful learning curve, making it easy to automate easy tasks, then slowly adds complexity to automake complex tasks.
There's the excellent snakemake tutorial, and this 5-step Qiime2 pipeline I made that trains a sklearn classifier
EDIT: One of the best parts about Snakemake + Qiime2 is that you can build fully automated pipelines AND you still have the detailed provenance baked into each Qiime2 artifact produced. So even as you distribute results or proceed with downstream analysis outside of your Snakemake pipeline, the full provenance is preserved.