q2-SCRuB release

We are proud to (belatedly) announce the release of the q2-SCRuB plugin!

SCRuB is a tool designed to help researchers address the common issue of contamination in microbial studies.

Principal concept

SCRuB is a probabilistic in silico decontamination method that incorporates shared information across multiple samples and controls to precisely identify and remove contamination. It models each sample of interest as a mixture of contamination and non-contamination (“biological”/“real”) sources, and each control sample as a noisy realization of a latent contamination source. It further uses the spatial location of a sample during processing (for example, location on a 96-well plate) to account for leakage of non-control samples into controls. A detailed description can be found here and here.


This package provides an easy to use framework to apply SCRuB to your projects. All you need to get started are n samples x m taxa count matrices for both your samples and controls. In addition, locations of samples and controls during processing are optional but recommended. To begin, we recommend working through SCRuB's documentation pages . This documentation includes installation steps on the homepage, and examples of qiime commands with SCRuB. In addition, we provide the key plugin details below.



conda activate qiime2-2023.5
conda install -c conda-forge r-devtools
Rscript -e 'devtools::install_github("shenhav-and-korem-labs/SCRuB"); torch::install_torch()'
pip install git+https://github.com/Shenhav-and-Korem-labs/q2-SCRuB.git

Example data

In this tutorial we use SCRuB to decontaminate a dataset using a Poore et al. This data can be downloaded with the following links:

First, we make a tutorial directory and download the data specified above to the plasma-data directory:

mkdir SCRuB-example
mkdir SCRuB-example/plasma-data
mkdir SCRuB-example/results
cd SCRuB-example/plasma-data
wget https://github.com/Shenhav-and-Korem-labs/q2-SCRuB/raw/main/ipynb/plasma-data/table.qza
wget https://github.com/Shenhav-and-Korem-labs/q2-SCRuB/raw/main/ipynb/plasma-data/metadata.tsv
cd ..


To run SCRuB we only need a single command. In this tutorial our control_idx_column parameter is is_control, our sample_type_column is sample_type, and our well_location_column is well_id. Now we are ready to SCRuB away the contamination.

qiime SCRuB SCRuB \
--i-table plasma-data/table.qza \
--m-metadata-file plasma-data/metadata.tsv \
--p-control-idx-column is_control \
--p-sample-type-column sample_type \
--p-well-location-column well_id \
--p-control-order "control blank library prep,control blank DNA extraction" \
--o-scrubbed results/scrubbed.qza

Outputs of the tutorial can be found here.

Extended tutorial

Extended version of this tutorial can be found in our documentation pages.

Issue reporting

Please share any issues or feature requests in our GitHub repo's issues page.


If you use this tool, please cite:
Austin, G.I., Park, H., Meydan, Y. et al. Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01696-w