Differential abundance analysis with q2-aldex2

dgiguer · October 8, 2019, 7:04pm

q2-aldex2

More documentation is available in the plugin library.

ALDEx2 is a differential abundance package that was initially developed for meta-RNA-Seq, but has performed very well with traditional RNA-Seq, 16S rRNA gene sequencing, and selective growth-type (SELEX) experiments. In principle, ALDEx2 is generalizable and can be applied to nearly any type of data that is generated by high-throughput sequencing that generates tables of per-feature counts for each sample Fernandes 2014. Here we will apply it to the "Moving Pictures" dataset by Caporaso et al. 2011 to investigate differential abundance of the gut in two individuals.

This plugin is currently under development, and has only been tested QIIME 2 version 2019.7. If you encounter bugs, please report them as an issue on the Github repository and make a post on the QIIME forum (tag @dgiguer). If you would like to request a feature, we encourage you to make a post on the QIIME forum and/or the Github repository.

Tutorial

Note: This tutorial assumes you already have an artifact generated from your counts table. See this wiki for more information on how to get your count table and metadata into an appropriate format.

We will start with the artifact generated by the DADA2 pipeline in the Moving Pictures tutorial, table.qza. The metadata can be downloaded in the first step in the tutorial. The first step for ALDEx2 is to filter the samples to only compare the gut site.

# get the output from DADA2 
wget https://docs.qiime2.org/2019.7/data/tutorials/moving-pictures/table.qza

# get the metadata
wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2019.7/tutorials/moving-pictures/sample_metadata.tsv"
	
# filter the samples
qiime feature-table filter-samples \
  --i-table table.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where "[body-site]='gut'" \
  --o-filtered-table gut-table.qza

The next step will be to run ALDEx2. The full pipeline is implemented in the aldex2 function (modularity will be added in the near-future) The input is the FeatureTable[Frequency], as well as a metadata file. The metadata file is necessary for defining the different groups you will be testing. For this tutorial, the groups are identified by the subject column from the metadata file. ALDEx2 automatically adds a prior (see Results here for more technical details of the prior) to remove zeros from the data, and filters any samples with 0 reads.

qiime aldex2 aldex2 \
	--i-table gut-table.qza \
	--m-metadata-file sample-metadata.tsv \
	--p-condition subject \
	--output-dir gut-test

The output artifact is differentials.qza, which contains a summary of the ALDEx2 output (difference, dispersion, effect, q-score, etc). From this artifact, we can visualize and extract the differentially abundant features. It is important to visualize the size of the difference between conditions (difference) as well as the size of the difference within conditions (dispersion) to capture the full context of the within-group variation. One feature may appear as differentially expressed if it has a very small dispersion and slightly larger difference, while another may have a large difference, but an even larger dispersion. These are both cases where caution should be used when calling differentially expressed features.

The visualizer aldex2 effect-plot takes as input the differentials.qzv artifact, and creates several plots. More information about these plots can be found in the ALDEx2 vignette.

qiime aldex2 effect-plot \
	--i-table gut-test/differentials.qza \
	--o-visualization gut-test/gut_test

The plots are then viewed using the qiime tools view command:

qiime tools view gut-test/gut_test.qzv

Your browser should open with the following plots:

gut_test_screenshot

Yay! We have differentially expressed features (coloured in red). We can extract the features and detailed information from the ALDEx2 summary output using extract-differences.

qiime aldex2 extract-differences \
	--i-table gut-test/differentials.qza \
	--o-differentials gut-test/sig_gut \
	--p-sig-threshold 0.1 \
	--p-effect-threshold 0 \
	--p-difference-threshold 0

The tab separated file of differentially called features can be exported.

qiime tools export \
	--input-path gut-test/sig_gut.qza \
	--output-path differentially-expressed-features

# view the file
head differentially-expressed-features/differentials.tsv

Citations

Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq. Parkinson J, editor. PLoS ONE. 2013 Jul 2;8(7):e67019–15.

Fernandes AD, Reid JN, Macklaim JM, McMurrough TA, Edgell DR, Gloor GB. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome. BioMed Central; 2014;2(1):15.

Gloor GB, Macklaim JM, Fernandes AD. Displaying Variation in Large Datasets: Plotting a Visual Summary of Effect Sizes. Journal of Computational and Graphical Statistics. 2016 Aug 5;25(3):971–9.

Other Resources

For more information on high-throughput sequencing data as compositions, see the following:

Thomas P Quinn, Ionas Erb, Greg Gloor, Cedric Notredame, Mark F Richardson, Tamsyn M Crowley, A field guide for the compositional analysis of any-omics data, GigaScience, Volume 8, Issue 9, September 2019, giz107, https://doi.org/10.1093/gigascience/giz107

Quinn TP, Erb I, Richardson MF, Crowley TM. Understanding sequencing data as compositions: an outlook and review. Wren J, editor. Bioinformatics. 2018 Aug 15;34(16):2870–8.

Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017;8:2224.

Gloor GB, Reid G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol. 2016 Aug;62(8):692–703.

The vignette from the Bioconductor package is also a great place for information about ALDEx2 and how it can be used.