ANCOM tutorial: Moving Pictures of the Human Microbiome dataset

gregcaporaso · August 1, 2017, 6:34pm

This community tutorial has been migrated to our official documentation. Please refer to that tutorial instead.

Click to see original community tutorial

Disclaimers

The version of ANCOM that is currently available in QIIME 2 is not the most recent version of ANCOM. We hope to remedy this in the future when we find a maintainer for this plugin. (If you're very comfortable with ANCOM, and are interested in contributing to QIIME 2 by keeping an ANCOM visualizer up-to-date, please get in touch on the QIIME 2 Forum).

As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the ANCOM paper before using this method.

This tutorial was tested using QIIME 2 2017.7, and is not guaranteed to work with other versions of QIIME 2.

Using ANCOM

ANCOM can be applied to identify features that are differentially abundant (or present in different abundances) across sample groups. ANCOM is implemented in the q2-composition plugin. This tutorial illustrates how to use ANCOM for differential abundance testing, and is designed to be run following the completion of the QIIME 2 "Moving Pictures" tutorial using artifacts that are generated in that tutorial.

Activate your QIIME 2 2017.7 environment and download the following files to run this tutorial:

sample-metadata.tsv (3.6 KB)
taxonomy.qza (48.3 KB)
table.qza (43.6 KB)

ANCOM assumes that few (less than about 25%) of the features are changing between groups. If you expect than more features are changing between your groups, you should not use ANCOM as it will be more error-prone (an increase in both Type I and Type II errors is possible). Because we expect a lot of features to change in abundance across body sites, in this tutorial we'll filter our full feature table to only contain gut samples. We'll then apply ANCOM to determine which, if any, sequence variants and genera are differentially abundant across the gut samples of our two subjects.

We'll start by creating a feature table that contains only the gut samples. (To learn more about filtering, see the Filtering Data tutorial.)

qiime feature-table filter-samples \
  --i-table table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where "BodySite='gut'" \
  --o-filtered-table gut-table.qza

ANCOM operates on a FeatureTable[Composition] QIIME 2 artifact, which is based on frequencies of features on a per-sample basis, but cannot tolerate frequencies of zero. To build the composition artifact, a FeatureTable[Frequency] artifact must be provided to add-pseudocount (an imputation method), which will produce the FeatureTable[Composition] artifact.

We can then run ANCOM on the Subject category to determine what features differ in abundance across the gut samples of the two subjects.

qiime composition add-pseudocount \
  --i-table gut-table.qza \
  --o-composition-table comp-gut-table.qza

qiime composition ancom \
  --i-table comp-gut-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-category Subject \
  --o-visualization ancom-Subject.qzv

ancom-Subject.qzv (33.4 KB)

What sequence variants differ in abundance across Subject? What subject is each most and least abundant in? What are the taxonomies of some of these sequence variants? (To answer that last question you'll need to refer to another visualization that was generated in the Moving Pictures tutorial.)

We're also often interested in performing a differential abundance test at a specific taxonomic level. To do this, we can collapse the features in our FeatureTable[Frequency] at the taxonomic level of interest, and then re-run the above steps. We collapse our feature table at the genus level (i.e., level 6 of the Greengenes taxonomy).

qiime taxa collapse \
  --i-table gut-table.qza \
  --i-taxonomy taxonomy.qza \
  --p-level 6 \
  --o-collapsed-table gut-table-l6.qza

qiime composition add-pseudocount \
  --i-table gut-table-l6.qza \
  --o-composition-table comp-gut-table-l6.qza

qiime composition ancom \
  --i-table comp-gut-table-l6.qza \
  --m-metadata-file sample-metadata.tsv \
  --m-metadata-category Subject \
  --o-visualization l6-ancom-Subject.qzv

l6-ancom-Subject.qzv (38.0 KB)

What genera differ in abundance across Subject? What subject is each most and least abundant in?