About the Alpha and Beta Diversity Analysis Tutorial
This Alpha and Beta Diversity Community Tutorial (run using QIIME 2017.12) walks you through analyzing the alpha and beta diversity of a sample dataset. Below you will find a link to a small test dataset to download and use in this tutorial.
Files used in tutorial
The following files, derived from the Moving Pictures tutorial, are used in this document.
Alpha Diversity Analysis
The alpha
and alphaphylogenetic
methods compute a userspecified alpha diversity metric for all samples in a feature table.
Phylogenetic alpha diversity metrics (in this case, Faithâs Phylogenetic Diversity), can be run with the following command:
qiime diversity alphaphylogenetic \
itable table.qza \
iphylogeny rootedtree.qza \
pmetric faith_pd \
oalphadiversity faith_pd_vector.qza
Nonphylogenetic alpha diversity metrics (in this case, Observed OTUs), can be run with the following command:
qiime diversity alpha \
itable table.qza \
pmetric observed_otus \
oalphadiversity observed_otus_vector.qza
The itable
input provides the feature table containing the samples for which the alpha diversity metric will be computed. The iphylogeny
input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the alphaphylogenetic
command (i.e., when computing phylogenetic diversity metrics. The pmetric
parameter specifies the alpha diversity metric to be run. The oalphadiversity
output specifies the output file.
To compute a different alpha diversity metric, change the ``pmetric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available alpha diversity metrics in QIIME 2.

Abundancebased Coverage Estimator (ACE) metric: Calculates the ACE metric
 Estimates species richness using a correction factor
pmetric
: ace Chao, A. and Lee, S.M.. (1992). âEstimating the number of classes via sample coverageâ. Journal of the American Statistical Association. (87): 210217.

BergerParker Dominance Index: Calculates BergerParker dominance index
 Relative richness of the abundant species
pmetric
: berger_parker_d Berger, W.H. and Parker, F.L. (1970). âDiversity of planktonic Foraminifera in deep sea sedimentsâ. Science. (168): 13451347.

Brillouinâs index: Calculates Brillouinâs index
 Measures the diversity of the species present
 Use when randomness canât be guaranteed
pmetric
: brillouin_d Pielou, E. C. (1975). Ecological Diversity. New York, Wiley InterScience.

Chao1 confidence interval: Calculates chao1 confidence interval
 Confidence interval for richness estimator, Chao1
pmetric
: chao1_ci Colwell, R.K., Mao, C.X., Chang, J. (2004). âInterpolating, extrapolating, and comparing incidencebased species accumulation curves.â Ecology. (85), 27172727.

Chao1 index: Calculates Chao1 index
 Estimates diversity from abundant data
 Estimates number of rare taxa missed from undersampling
pmetric
: chao1 *Chao, A. (1984). âNonparametric estimation of the number of classes in a populationâ.

Dominance measure: Calculates dominance measure**
 How equally the taxa are presented
pmetric
: dominance

Effective Number of Species (ENS)/Probability of intraor interspecific encounter (PIE) metric: Calculates Effective Number of Species (ENS)/Probability of intraor interspecific encounter (PIE) metric
 Shows how absolute amount of species, relative abundances of species, and their intraspecific clustering affect differences in biodiversity among communities
pmetric
: enspie Chase, J.M., and Knight, R. (2013). âScaledependent effect sizes of ecological drivers on biodiversity: why standardised sampling is not enoughâ. Ecology Letters (16): 1726.

Etsy confidence interval: Calculates Estyâs confidence interval
 Confidence interval for how many singletons in total individuals
pmetric
: etsy_ci Esty, W. W. (1983). âA normal limit law for a nonparametric estimator of the coverage of a random sampleâ. Ann Statist. (11): 905912.

Faithâs phylogenetic diversity: Calculates faithâs phylogenetic diversity
 Measures of biodiversity that incorporates phylogenetic difference between species
 Sum of length of branches
pmetric
: faith_pd Faith. D.P. (1992). âConservation evaluation and phylogenetic diversityâ. Biological Conservation. (61) 110.

Fisherâs index: Calculates Fisherâs index
 Relationship between the number of species and the abundance of each species
pmetric
: fisher_alpha Fisher, R.A., Corbet, A.S. and Williams, C.B. (1943). âThe relation between the number of species and the number of individuals in a random sample of an animal populationâ. Journal of Animal Ecology. (12): 4258.

Gini index: Calculates Gini index
 Measures species abundance
 Assumes that the sampling is accurate and that additional data would fall on linear gradients between the values of the given data
pmetric
: gini_index Gini, C. (1912). âVariability and Mutabilityâ. C. Cuppini, Bologna. 156.

Goodâs coverage of counts: Calculates Goodâs coverage of counts.
 Estimates the percent of an entire species that is represented in a sample
pmetric
: goods_coverage Good. I.J (1953) âThe populations frequency of Species and the Estimation of Populations Parametersâ. Biometrika. 40(3/4):237264

Heipâs evenness measure: Calculates Heipâs evenness measure.
 Removes dependency on species number
pmetric
: heip_e Heip, C. (1974). âA new index measuring evennessâ. J. Mar. Biol. Ass. UK. (54): 555557.

KemptonTaylor Q index: Calculates KemptonTaylor Q index
 Measured diversity based off the distributions of species
 Makes abundance curve based off all species and IQR is used to measure diversity
pmetric
: kempton_taylor_q Kempton, R.A. and Taylor, L.R. (1976). âModels and statistics for species diversityâ. Nature (262): 818820.

Lladserâs confidence interval: Calculates Lladserâs confidence interval
 Single confidence interval of the conditional uncovered probability
pmetric
: lladser_ci Lladser, M.E., Gouet, R., Reeder, R. (2011). âExtrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknownâ. PLoS.

Lladserâs point estimate: Calculates Lladserâ point estimate
 Estimates how much of the environment contains unsampled taxa
 Best estimate on a complete sample
pmetric
: lladser_pe Lladser, M.E., Gouet, R., Reeder, J. (2011). âExtrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknownâ. PLoS.

Margalefâs richness index: Calculates Margalefâs richness index
 Measures species richness in a given area or community
pmetric
: margalef Magurran, A.E. (2004). âMeasuring biological diversityâ. Blackwell. 7677.

Mcintosh dominance index D: Calculates McIntosh dominance index D
 Affected by the variation in dominant taxa and less affected by the variation in less abundant or rare taxa
pmetric
: msintosh_d McIntosh, R.P. (1967). âAn index of diversity and the relation of certain concepts to diversityâ. Ecology (48): 392404.

Mcintosh evenness index E: Calculates McIntoshâs evenness measure E
 How evenly abundant taxa are
pmetric
: mcintosh_e Heip, C. (1974). âA new index measuring evennessâ. J. Mar. Biol. Ass. UK. (54) 555557.

Menhinickâs richness index: Calculates Menhinickâs richness index
 The ratio of the number of taxa to the square root of the sample size
pmetric
: menhinick Magurran, A.E. (2004). âMeasuring biological diversityâ. Blackwell. 7677.

MichaelisMenten fit to rarefaction curve of observed OTUs: Calculates MichaelisMenten fit to rarefaction curve of observed OTUs.
 Estimated richness of species pools
pmetric
: michaelis_mentin_fit Raaijmakers, J.G.W. (1987). âStatistical analysis of the MichaelisMenten equationâ. Biometrics. (43): 793803.

Number of distinct features: Calculates number of distinct OTUs
pmetric
: observed_otus DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). âGreengenes, a ChimeraChecked 16S rRNA Gene Database and Workbench Compatible with ARBâ. Applied and Environmental Microbiology (72): 5069â5072.

Number of double occurrences: Calculates number of double occurrence OTUs (doubletons)
 OTUs that only occur twice
pmetric
: doubles

Number of observed features, including singles and doubles: Calculates number of observed OTUs, singles, and doubles.
pmetric
: osd DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). âGreengenes, a ChimeraChecked 16S rRNA Gene Database and Workbench Compatible with ARBâ. Applied and Environmental Microbiology. 72 (7): 5069â5072.

Singles: Calculates number of single occurrence OTUs (singletons)
 OTUs that appear only once in a given sample
pmetric
: singles

Pielouâs evenness: Calculates Pielouâs eveness
 Measure of relative evenness of species richness
pmetric
: pielou_e Pielou, E. (1966). âThe measurement of diversity in different types of biological collectionsâ. J. Theor. Biol. (13): 131144.

Robbinsâ estimator: Calculates Robbinsâ estimator
 Probability of unobserved outcomes
pmetric
: robbins Robbins, H.E. (1968). âEstimating the Total Probability of the unobserved outcomes of an experimentâ. Ann Math. Statist. 39(1): 256257.

Shannonâs index: Calculates Shannonâs index
 Calculates richness and diversity using a natural logarithm
 Accounts for both abundance and evenness of the taxa present
pmetric
: shannon Shannon, C.E. and Weaver, W. (1949). âThe mathematical theory of communicationâ. University of Illonois Press, Champaign, Illonois.

Simpson evenness measure E: Calculates Simpsonâs evenness measure E.
 Diversity that account for the number of organisms and number of species
pmetric
: simpson_e Simpson, E.H. (1949). âMeasurement of Diversityâ. Nature. (163): 688

Simpsonâs index: Calculates Simpsonâs index
 Measures the relative abundance of the different species making up the sample richness
pmetric
: simpson Simpson, E.H. (1949). âMeasurement of diversity". Nature. (163): 688.

Strongâs dominance index (Dw): Calculates Strongâs dominance index
 Measures species abundance unevenness
pmetric
: strong Strong, W.L. (2002). âAssessing species abundance uneveness within and between plant communitiesâ. Community Ecology (3): 237246.
Beta Diversity Analysis
The beta
and betaphylogenetic
methods compute a userspecified beta diversity metric for all samples in a feature table.
Phylogenetic beta diversity metrics (in this case, Unweighted UniFrac), can be run with the following command:
qiime diversity betaphylogenetic \
itable table.qza \
iphylogeny rootedtree.qza \
pmetric unweighted_unifrac \
odistancematrix unweighted_unifrac_distance_matrix.qza
Nonphylogenetic beta diversity metrics (in this case, BrayCurtis), can be run with the following command:
qiime diversity beta \
itable table.qza \
pmetric braycurtis \
odistancematrix unweighted_unifrac_distance_matrix.qza
The itable
input provides the feature table containing the samples for which the beta diversity metric will be computed. The iphylogeny
input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the betaphylogenetic
command (i.e., when computing phylogenetic diversity metrics. The pmetric
parameter specifies the beta diversity metric to be run. The odistancematrix
output specifies the output file.
To compute a different beta diversity metric, change the ``pmetric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available beta diversity metrics in QIIME 2.

BrayCurtis dissimilarity: Calculates BrayâCurtis dissimilarity
 Fraction of overabundant counts
pmetric
: braycurtis Sorenson, T. (1948) "A method of establishing groups of equal amplitude in plant sociology based on similarity of species content." Kongelige Danske Videnskabernes Selskab 5.134: 47.

Canberra distance: Calculates Canberra distance
 Overabundance on a feature by feature basis
pmetric
: canberra Lance, Godfrey L.N. and Williams, W.T. (1967). "A general theory of classificatory sorting strategies II. Clustering systems." The computer journal 10 (3):271277.

Chebyshev distance: Calculates Chebyshev distance
 Maximum distance between two samples
pmetric
: chebyshev Cyrus. D. Cantrell (2000). âModern Mathematical Methods for Physicists and Engineersâ. Cambridge University Press.

Cityblock distance: Calculates Cityblock distance
 Similar to the Euclidean distance but the effect of a large difference in a single dimension is reduced
pmetric
: cityblock Paul, E.B. (2006). âManhattan distance". Dictionary of Algorithms and Data Structures

Correlation coefficient: Measures Correlation coefficient
 Measure of strength and direction of linear relationship between samples
pmetric
: correlation Galton, F. (1877). "Typical laws of heredity". Nature. 15 (388): 492â495.

Cosine Similarity: Measures Cosine similarity
 Ratio of the amount of common species in a sample to the mean of the two samples
pmetric
: cosine Ochiai, A. (1957). âZoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighhouring RegionsIIâ. Nippon Suisan Gakkaishi. 22(9): 526530.

Dice measures: Calculates Dice measure
 Statistic used for comparing the similarity of two samples
 Only counts true positives once
pmetric
: dice Dice, Lee R. (1945). "Measures of the Amount of Ecologic Association Between Species". Ecology. 26 (3): 297â302.

Euclidean distance: Measures Euclidean distance
 Speciesbyspecies distance matrix
pmetric
: euclidean Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951963.

Generalized Unifrac: Measures Generalized UniFrac
 Detects a wider range of biological changes compared to unweighted and weighted UniFrac
pmetric
: generalized_unifrac Chen, F., Bittinger, K., Charlson, E.S., Hoffmann, C., Lewis, J., Wu, G. D., Collman, R.G., Bushman, R.D., Li,H. (2012). âAssociating microbiome composition with environmental covariates using generalized UniFrac distances.â Bioinformatics. 28 (16): 21062113.

Hamming distance: Measures Hamming distance
 Minimum number of substitutions required to change one group to the other
pmetric
: hamming Hamming, R.W. (1950) âError Detecting and Error Connecting Codesâ. The Bell System Technical Journal. (29): 147160.

Jaccard similarity index: Calculates Jaccard similarity index
 Fraction of unique features, regardless of abundance
pmetric
: jaccard Jaccard, P. (1908). âNouvellesrecherches sur la distribution florale.â Bull. Soc. V and. Sci. Nat., (44):223270.

Kulczynski dissimilarity index: Measures Kulczynski dissimilarity index
 Describes the dissimilarity between two samples
pmetric
: kulsinski Kulcynski, S. (1927). âDie Pflanzenassoziationen der Pieninen. Bulletin International de lâAcademie Polonaise des Sciences et des Lettresâ. Classe des Sciences Mathematiques et Naturelles. 57203.

Mahalanobis distance: Calculates Mahalanobis distance
 How many standard deviations one sample is away from the mean
 Unitless and scaleinvariant
 Takes into account the correlations of the data set
pmetric
: mahalanobis Citation: Mahalanobis, Chandra, P. (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India. 2 (1): 49â55.

Matching components: Measures Matching components
 Compares indices under all possible situations
pmetric
: matching Janson, S., and Vegelius, J. (1981). âMeasures of ecological associationâ. Oecologia. (49): 371â376.

Rogerstanimoto distance: Measures RogersTanimoto distance
 Allows the possibility of two samples, which are quite different from each other, to both be similar to a third
pmetric
: rogerstanimoto Tanimoto, T. (1958). "An Elementary Mathematical theory of Classification and Prediction". New York: Internal IBM Technical Report.

RusselRao coefficient: Calculates RussellRao coefficients
 Equal weight is given to matches and nonmatches
pmetric
: russelrao Russell, P.F. and Rao, T.R. (1940). âOn habitat and association of species of anopheline larvae in southeastern Madrasâ. J. Malaria Inst. India. (3): 153178.

SokalMichener coefficient: Measures SokalMichener coefficient
 Proportion of matches between samples
pmetric
: sokalmichener Sokal, R.R. and Michener, C.D. (1958). âA statistical method for evaluating systematic relationshipsâ. Univ. Kans. Sci. Bull. (38) 14091438.

SokalSneath Index: Calculates SokalSneath index
 Measure of species turnover
pmetric
: sokalsneath Sokal, R.R. and Sneath, P.H.A. (1963). âPrinciples of Numerical Taxonomyâ. W. H. Freeman, San Francisco, California.

Speciesbyspecies Euclidean: Measures Speciesbyspecies Euclidean
 Standardized Euclidean distance between two groups
 Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation
pmetric
: seuclidean Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951963.

Squared Euclidean: Measures squared Euclidean distance
 Place progressively greater weight on samples that are farther apart
pmetric
: sqeuclidean Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951963.

Unweighted unifrac: Measures unweighted UniFrac
 Measures the fraction of unique branch length
pmetric
: unweighted_unifrac Lozupone, C. and Knight, R. (2005). "UniFrac: a new phylogenetic method for comparing microbial communities." Applied and environmental microbiology 71 (12): 82288235.

Weighted Minkowski metric: Measures Weighted Minkowski metric
 Allows the use of the kmeanstype paradigm to cluster large data sets
pmetric
: wminkowski Chan, Y., Ching, W.K., Ng, M.K., Huang, J.Z. (2004). âAn optimization algorithm for clustering using weighted dissimilarity measuresâ. Pattern Recognition. 37(5): 943952.

Weighted normalized UniFrac: Measures Weighted normalized UniFrac
 Takes into account abundance
 Normalization adjusts for varying roottotip distances.
pmetric
: weighted_normalized_unifrac Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576â85.

Weighted unnormalized UniFrac: Measures Weighted unnormalized UniFrac
 Takes into account abundance
 Doesn't correct for unequal sampling effort or different evolutionary rates between taxa
pmetric
: weighted_unifrac Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576â85.

Yule index: Measures Yule index
 Measures biodiversity
 Determined by the diversity of species and the proportions between the abundance of those species.
pmetric
: yule Fisher, R.A., Corbert, A.S., Williams, C.B. (1943). âThe Relationship Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Populationâ. J. Animal Ecol. (12): 4258.
To further analyze the results of your beta and alpha diversities, return to the QIIME 2 âMoving Pictures Tutorialâ tutorial and continue at the âalphagroupsignificanceâ command.