Evident: Community Tutorial
Evident is a tool for calculating the effect sizes of sample groupings microbiome data and performing statistical power analysis. With Evident, you can easily explore columns in your metadata that are highly associated with differences in microbial communities. These effect sizes can, in turn, be used to perform power analysis for variable levels of significance and number of observations.
We hope that researchers will use Evident to more carefully design their sequencing experiments to maximize power while minimizing wasted resources. You can look at an existing dataset, determine the effect size of the sample groupings, and calculate the number of samples required to reject the null hypothesis at a priori defined values of statistical significance and power given this effect size.
Evident can operate on either
- Univariate data (e.g. alpha diversity, log-ratios, etc.)
- Multivariate data (e.g. beta diversity distance matrix)
See our preprint on bioRxiv for more details. While we do provide a QIIME 2 interface (hence this post!), we also provide a standalone Python package that is more full featured, including an interactive web app for effect size exploration and power analysis (see the README).
Installing
pip install evident
Example Usage - Power Analysis
We can use a QIIME 2 SampleData[AlphaDiversity]
Artifact as input. We pass this object in as a --m-sample-metadata-file
in addition to our sample metadata. This is possible because the SampleData[AlphaDiversity]
can be interpreted as Metadata and merged with all files passed to --m-sample-metadata-file
. We provide the --p-data-column
with the column containing the diversity values and --p-group-column
with the column containing the groupings of interest.
We want to evaluate the statistical power at:
- Significance levels of 0.01, 0.05, 0.1
- Total observations from 10 to 100 in intervals of 10
We can pass these values to Evident separated by spaces. Notice that we use the seq UNIX command to generate the possible sample sizes.
qiime evident univariate-power-analysis \
--m-sample-metadata-file metadata.qza \
--m-sample-metadata-file faith_pd.qza \
--p-data-column faith_pd \
--p-group-column classification \
--p-alpha 0.01 0.05 0.1 \
--p-total-observations $(seq 10 10 100) \
--o-power-analysis-results results.faithpd.qza
We can transform these results into a Visualization with
qiime evident visualize-results \
--i-results results.faithpd.qza \
--o-visualization results.faithpd.qzv
This creates a table that you can view directly at QIIME 2 View.
Alternatively, you can create a power curve from these results using
qiime evident plot-power-curve \
--i-power-analysis-results results.faithpd.qza \
--p-target-power 0.8 \
--p-style alpha \
--o-visualization curve.qzv
Example Usage - Effect Sizes
We can also use Evident to generate effect sizes for multiple categorical groupings at once. This process can be parallelized for efficiency on high-performance computing environments. In this example, we want to calculate the effect size of alpha diversity differences among samples in three categories: classification
, sex
, & cd_behavior
.
For binary categories, Evident calculates Cohen's d. For multi-class (>2) categories, Evident calculates Cohen's f.
qiime evident univariate-effect-size-by-category \
--m-sample-metadata-file metadata.qza \
--m-sample-metadata-file faith_pd.qza \
--p-data-column faith_pd \
--p-n-jobs 3 \
--p-group-columns classification sex cd_behavior \
--o-effect-size-results alpha_effect_sizes.qza
Additionally, you can specify that you want to calculate all pairwise comparisons of groups with more than two factors. This will calculate Cohen's d for all possible combinations of levels. For example, if you had groups high-fat diet, low-fat diet, and control diet, you could evaluate the effect sizes of alpha diversity differences between the following comparisons:
- High-fat diet vs. low-fat diet
- High-fat diet vs. control diet
- Low-fat diet vs. control diet
qiime evident univariate-effect-size-by-category \
--m-sample-metadata-file metadata.qza \
--m-sample-metadata-file faith_pd.qza \
--p-data-column faith_pd \
--p-pairwise \
--p-n-jobs 3 \
--p-group-columns classification sex cd_behavior \
--o-effect-size-results alpha_effect_sizes.qza
While we used univariate data in this tutorial, the commands are nearly identical for multivariate distance matrices.
Please see the README on the GitHub repository for more details. Feel free to post in the Community Plugin Support category if you encounter have questions. If your issue is related to the code (e.g. a bug), you may consider opening an issue on GitHub.