About the Alpha and Beta Diversity Analysis Tutorial
This Alpha and Beta Diversity Community Tutorial (run using QIIME 2017.12) walks you through analyzing the alpha and beta diversity of a sample dataset. Below you will find a link to a small test dataset to download and use in this tutorial.
Files used in tutorial
The following files, derived from the Moving Pictures tutorial, are used in this document.
Alpha Diversity Analysis
The alpha
and alphaphylogenetic
methods compute a userspecified alpha diversity metric for all samples in a feature table.
Phylogenetic alpha diversity metrics (in this case, Faith’s Phylogenetic Diversity), can be run with the following command:
qiime diversity alphaphylogenetic \
itable table.qza \
iphylogeny rootedtree.qza \
pmetric faith_pd \
oalphadiversity faith_pd_vector.qza
Nonphylogenetic alpha diversity metrics (in this case, Observed OTUs), can be run with the following command:
qiime diversity alpha \
itable table.qza \
pmetric observed_otus \
oalphadiversity observed_otus_vector.qza
The itable
input provides the feature table containing the samples for which the alpha diversity metric will be computed. The iphylogeny
input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the alphaphylogenetic
command (i.e., when computing phylogenetic diversity metrics. The pmetric
parameter specifies the alpha diversity metric to be run. The oalphadiversity
output specifies the output file.
To compute a different alpha diversity metric, change the ``pmetric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available alpha diversity metrics in QIIME 2.

Abundancebased Coverage Estimator (ACE) metric: Calculates the ACE metric
 Estimates species richness using a correction factor
pmetric
: ace Chao, A. and Lee, S.M.. (1992). “Estimating the number of classes via sample coverage”. Journal of the American Statistical Association. (87): 210217.

BergerParker Dominance Index: Calculates BergerParker dominance index
 Relative richness of the abundant species
pmetric
: berger_parker_d Berger, W.H. and Parker, F.L. (1970). “Diversity of planktonic Foraminifera in deep sea sediments”. Science. (168): 13451347.

Brillouin’s index: Calculates Brillouin’s index
 Measures the diversity of the species present
 Use when randomness can’t be guaranteed
pmetric
: brillouin_d Pielou, E. C. (1975). Ecological Diversity. New York, Wiley InterScience.

Chao1 confidence interval: Calculates chao1 confidence interval
 Confidence interval for richness estimator, Chao1
pmetric
: chao1_ci Colwell, R.K., Mao, C.X., Chang, J. (2004). “Interpolating, extrapolating, and comparing incidencebased species accumulation curves.” Ecology. (85), 27172727.

Chao1 index: Calculates Chao1 index
 Estimates diversity from abundant data
 Estimates number of rare taxa missed from undersampling
pmetric
: chao1 *Chao, A. (1984). “Nonparametric estimation of the number of classes in a population”.

Dominance measure: Calculates dominance measure**
 How equally the taxa are presented
pmetric
: dominance

Effective Number of Species (ENS)/Probability of intraor interspecific encounter (PIE) metric: Calculates Effective Number of Species (ENS)/Probability of intraor interspecific encounter (PIE) metric
 Shows how absolute amount of species, relative abundances of species, and their intraspecific clustering affect differences in biodiversity among communities
pmetric
: enspie Chase, J.M., and Knight, R. (2013). “Scaledependent effect sizes of ecological drivers on biodiversity: why standardised sampling is not enough”. Ecology Letters (16): 1726.

Etsy confidence interval: Calculates Esty’s confidence interval
 Confidence interval for how many singletons in total individuals
pmetric
: etsy_ci Esty, W. W. (1983). “A normal limit law for a nonparametric estimator of the coverage of a random sample”. Ann Statist. (11): 905912.

Faith’s phylogenetic diversity: Calculates faith’s phylogenetic diversity
 Measures of biodiversity that incorporates phylogenetic difference between species
 Sum of length of branches
pmetric
: faith_pd Faith. D.P. (1992). “Conservation evaluation and phylogenetic diversity”. Biological Conservation. (61) 110.

Fisher’s index: Calculates Fisher’s index
 Relationship between the number of species and the abundance of each species
pmetric
: fisher_alpha Fisher, R.A., Corbet, A.S. and Williams, C.B. (1943). “The relation between the number of species and the number of individuals in a random sample of an animal population”. Journal of Animal Ecology. (12): 4258.

Gini index: Calculates Gini index
 Measures species abundance
 Assumes that the sampling is accurate and that additional data would fall on linear gradients between the values of the given data
pmetric
: gini_index Gini, C. (1912). “Variability and Mutability”. C. Cuppini, Bologna. 156.

Good’s coverage of counts: Calculates Good’s coverage of counts.
 Estimates the percent of an entire species that is represented in a sample
pmetric
: goods_coverage Good. I.J (1953) “The populations frequency of Species and the Estimation of Populations Parameters”. Biometrika. 40(3/4):237264

Heip’s evenness measure: Calculates Heip’s evenness measure.
 Removes dependency on species number
pmetric
: heip_e Heip, C. (1974). “A new index measuring evenness”. J. Mar. Biol. Ass. UK. (54): 555557.

KemptonTaylor Q index: Calculates KemptonTaylor Q index
 Measured diversity based off the distributions of species
 Makes abundance curve based off all species and IQR is used to measure diversity
pmetric
: kempton_taylor_q Kempton, R.A. and Taylor, L.R. (1976). “Models and statistics for species diversity”. Nature (262): 818820.

Lladser’s confidence interval: Calculates Lladser’s confidence interval
 Single confidence interval of the conditional uncovered probability
pmetric
: lladser_ci Lladser, M.E., Gouet, R., Reeder, R. (2011). “Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown”. PLoS.

Lladser’s point estimate: Calculates Lladser’ point estimate
 Estimates how much of the environment contains unsampled taxa
 Best estimate on a complete sample
pmetric
: lladser_pe Lladser, M.E., Gouet, R., Reeder, J. (2011). “Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown”. PLoS.

Margalef’s richness index: Calculates Margalef’s richness index
 Measures species richness in a given area or community
pmetric
: margalef Magurran, A.E. (2004). “Measuring biological diversity”. Blackwell. 7677.

Mcintosh dominance index D: Calculates McIntosh dominance index D
 Affected by the variation in dominant taxa and less affected by the variation in less abundant or rare taxa
pmetric
: msintosh_d McIntosh, R.P. (1967). “An index of diversity and the relation of certain concepts to diversity”. Ecology (48): 392404.

Mcintosh evenness index E: Calculates McIntosh’s evenness measure E
 How evenly abundant taxa are
pmetric
: mcintosh_e Heip, C. (1974). “A new index measuring evenness”. J. Mar. Biol. Ass. UK. (54) 555557.

Menhinick’s richness index: Calculates Menhinick’s richness index
 The ratio of the number of taxa to the square root of the sample size
pmetric
: menhinick Magurran, A.E. (2004). “Measuring biological diversity”. Blackwell. 7677.

MichaelisMenten fit to rarefaction curve of observed OTUs: Calculates MichaelisMenten fit to rarefaction curve of observed OTUs.
 Estimated richness of species pools
pmetric
: michaelis_mentin_fit Raaijmakers, J.G.W. (1987). “Statistical analysis of the MichaelisMenten equation”. Biometrics. (43): 793803.

Number of distinct features: Calculates number of distinct OTUs
pmetric
: observed_otus DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a ChimeraChecked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.

Number of double occurrences: Calculates number of double occurrence OTUs (doubletons)
 OTUs that only occur twice
pmetric
: doubles

Number of observed features, including singles and doubles: Calculates number of observed OTUs, singles, and doubles.
pmetric
: osd DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a ChimeraChecked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology. 72 (7): 5069–5072.

Singles: Calculates number of single occurrence OTUs (singletons)
 OTUs that appear only once in a given sample
pmetric
: singles

Pielou’s evenness: Calculates Pielou’s eveness
 Measure of relative evenness of species richness
pmetric
: pielou_e Pielou, E. (1966). “The measurement of diversity in different types of biological collections”. J. Theor. Biol. (13): 131144.

Robbins’ estimator: Calculates Robbins’ estimator
 Probability of unobserved outcomes
pmetric
: robbins Robbins, H.E. (1968). “Estimating the Total Probability of the unobserved outcomes of an experiment”. Ann Math. Statist. 39(1): 256257.

Shannon’s index: Calculates Shannon’s index
 Calculates richness and diversity using a natural logarithm
 Accounts for both abundance and evenness of the taxa present
pmetric
: shannon Shannon, C.E. and Weaver, W. (1949). “The mathematical theory of communication”. University of Illonois Press, Champaign, Illonois.

Simpson evenness measure E: Calculates Simpson’s evenness measure E.
 Diversity that account for the number of organisms and number of species
pmetric
: simpson_e Simpson, E.H. (1949). “Measurement of Diversity”. Nature. (163): 688

Simpson’s index: Calculates Simpson’s index
 Measures the relative abundance of the different species making up the sample richness
pmetric
: simpson Simpson, E.H. (1949). “Measurement of diversity". Nature. (163): 688.

Strong’s dominance index (Dw): Calculates Strong’s dominance index
 Measures species abundance unevenness
pmetric
: strong Strong, W.L. (2002). “Assessing species abundance uneveness within and between plant communities”. Community Ecology (3): 237246.
Beta Diversity Analysis
The beta
and betaphylogenetic
methods compute a userspecified beta diversity metric for all samples in a feature table.
Phylogenetic beta diversity metrics (in this case, Unweighted UniFrac), can be run with the following command:
qiime diversity betaphylogenetic \
itable table.qza \
iphylogeny rootedtree.qza \
pmetric unweighted_unifrac \
odistancematrix unweighted_unifrac_distance_matrix.qza
Nonphylogenetic beta diversity metrics (in this case, BrayCurtis), can be run with the following command:
qiime diversity beta \
itable table.qza \
pmetric braycurtis \
odistancematrix unweighted_unifrac_distance_matrix.qza
The itable
input provides the feature table containing the samples for which the beta diversity metric will be computed. The iphylogeny
input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the betaphylogenetic
command (i.e., when computing phylogenetic diversity metrics. The pmetric
parameter specifies the beta diversity metric to be run. The odistancematrix
output specifies the output file.
To compute a different beta diversity metric, change the ``pmetric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available beta diversity metrics in QIIME 2.

BrayCurtis dissimilarity: Calculates Bray–Curtis dissimilarity
 Fraction of overabundant counts
pmetric
: braycurtis Sorenson, T. (1948) "A method of establishing groups of equal amplitude in plant sociology based on similarity of species content." Kongelige Danske Videnskabernes Selskab 5.134: 47.

Canberra distance: Calculates Canberra distance
 Overabundance on a feature by feature basis
pmetric
: canberra Lance, Godfrey L.N. and Williams, W.T. (1967). "A general theory of classificatory sorting strategies II. Clustering systems." The computer journal 10 (3):271277.

Chebyshev distance: Calculates Chebyshev distance
 Maximum distance between two samples
pmetric
: chebyshev Cyrus. D. Cantrell (2000). “Modern Mathematical Methods for Physicists and Engineers”. Cambridge University Press.

Cityblock distance: Calculates Cityblock distance
 Similar to the Euclidean distance but the effect of a large difference in a single dimension is reduced
pmetric
: cityblock Paul, E.B. (2006). “Manhattan distance". Dictionary of Algorithms and Data Structures

Correlation coefficient: Measures Correlation coefficient
 Measure of strength and direction of linear relationship between samples
pmetric
: correlation Galton, F. (1877). "Typical laws of heredity". Nature. 15 (388): 492–495.

Cosine Similarity: Measures Cosine similarity
 Ratio of the amount of common species in a sample to the mean of the two samples
pmetric
: cosine Ochiai, A. (1957). “Zoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighhouring RegionsII”. Nippon Suisan Gakkaishi. 22(9): 526530.

Dice measures: Calculates Dice measure
 Statistic used for comparing the similarity of two samples
 Only counts true positives once
pmetric
: dice Dice, Lee R. (1945). "Measures of the Amount of Ecologic Association Between Species". Ecology. 26 (3): 297–302.

Euclidean distance: Measures Euclidean distance
 Speciesbyspecies distance matrix
pmetric
: euclidean Legendre, P. and Caceres, M. (2013). “Beta diversity as the variance of community data: dissimilarity coefficients and partitioning.” Ecology Letters. 16(8): 951963.

Generalized Unifrac: Measures Generalized UniFrac
 Detects a wider range of biological changes compared to unweighted and weighted UniFrac
pmetric
: generalized_unifrac Chen, F., Bittinger, K., Charlson, E.S., Hoffmann, C., Lewis, J., Wu, G. D., Collman, R.G., Bushman, R.D., Li,H. (2012). “Associating microbiome composition with environmental covariates using generalized UniFrac distances.” Bioinformatics. 28 (16): 21062113.

Hamming distance: Measures Hamming distance
 Minimum number of substitutions required to change one group to the other
pmetric
: hamming Hamming, R.W. (1950) “Error Detecting and Error Connecting Codes”. The Bell System Technical Journal. (29): 147160.

Jaccard similarity index: Calculates Jaccard similarity index
 Fraction of unique features, regardless of abundance
pmetric
: jaccard Jaccard, P. (1908). “Nouvellesrecherches sur la distribution florale.” Bull. Soc. V and. Sci. Nat., (44):223270.

Kulczynski dissimilarity index: Measures Kulczynski dissimilarity index
 Describes the dissimilarity between two samples
pmetric
: kulsinski Kulcynski, S. (1927). “Die Pflanzenassoziationen der Pieninen. Bulletin International de l’Academie Polonaise des Sciences et des Lettres”. Classe des Sciences Mathematiques et Naturelles. 57203.

Mahalanobis distance: Calculates Mahalanobis distance
 How many standard deviations one sample is away from the mean
 Unitless and scaleinvariant
 Takes into account the correlations of the data set
pmetric
: mahalanobis Citation: Mahalanobis, Chandra, P. (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India. 2 (1): 49–55.

Matching components: Measures Matching components
 Compares indices under all possible situations
pmetric
: matching Janson, S., and Vegelius, J. (1981). “Measures of ecological association”. Oecologia. (49): 371–376.

Rogerstanimoto distance: Measures RogersTanimoto distance
 Allows the possibility of two samples, which are quite different from each other, to both be similar to a third
pmetric
: rogerstanimoto Tanimoto, T. (1958). "An Elementary Mathematical theory of Classification and Prediction". New York: Internal IBM Technical Report.

RusselRao coefficient: Calculates RussellRao coefficients
 Equal weight is given to matches and nonmatches
pmetric
: russelrao Russell, P.F. and Rao, T.R. (1940). “On habitat and association of species of anopheline larvae in southeastern Madras”. J. Malaria Inst. India. (3): 153178.

SokalMichener coefficient: Measures SokalMichener coefficient
 Proportion of matches between samples
pmetric
: sokalmichener Sokal, R.R. and Michener, C.D. (1958). “A statistical method for evaluating systematic relationships”. Univ. Kans. Sci. Bull. (38) 14091438.

SokalSneath Index: Calculates SokalSneath index
 Measure of species turnover
pmetric
: sokalsneath Sokal, R.R. and Sneath, P.H.A. (1963). “Principles of Numerical Taxonomy”. W. H. Freeman, San Francisco, California.

Speciesbyspecies Euclidean: Measures Speciesbyspecies Euclidean
 Standardized Euclidean distance between two groups
 Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation
pmetric
: seuclidean Legendre, P. and Caceres, M. (2013). “Beta diversity as the variance of community data: dissimilarity coefficients and partitioning.” Ecology Letters. 16(8): 951963.

Squared Euclidean: Measures squared Euclidean distance
 Place progressively greater weight on samples that are farther apart
pmetric
: sqeuclidean Legendre, P. and Caceres, M. (2013). “Beta diversity as the variance of community data: dissimilarity coefficients and partitioning.” Ecology Letters. 16(8): 951963.

Unweighted unifrac: Measures unweighted UniFrac
 Measures the fraction of unique branch length
pmetric
: unweighted_unifrac Lozupone, C. and Knight, R. (2005). "UniFrac: a new phylogenetic method for comparing microbial communities." Applied and environmental microbiology 71 (12): 82288235.

Weighted Minkowski metric: Measures Weighted Minkowski metric
 Allows the use of the kmeanstype paradigm to cluster large data sets
pmetric
: wminkowski Chan, Y., Ching, W.K., Ng, M.K., Huang, J.Z. (2004). “An optimization algorithm for clustering using weighted dissimilarity measures”. Pattern Recognition. 37(5): 943952.

Weighted normalized UniFrac: Measures Weighted normalized UniFrac
 Takes into account abundance
 Normalization adjusts for varying roottotip distances.
pmetric
: weighted_normalized_unifrac Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576–85.

Weighted unnormalized UniFrac: Measures Weighted unnormalized UniFrac
 Takes into account abundance
 Doesn't correct for unequal sampling effort or different evolutionary rates between taxa
pmetric
: weighted_unifrac Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576–85.

Yule index: Measures Yule index
 Measures biodiversity
 Determined by the diversity of species and the proportions between the abundance of those species.
pmetric
: yule Fisher, R.A., Corbert, A.S., Williams, C.B. (1943). “The Relationship Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population”. J. Animal Ecol. (12): 4258.
To further analyze the results of your beta and alpha diversities, return to the QIIME 2 “Moving Pictures Tutorial” tutorial and continue at the “alphagroupsignificance” command.