# About the Alpha and Beta Diversity Analysis Tutorial

This Alpha and Beta Diversity Community Tutorial (run using QIIME 2017.12) walks you through analyzing the alpha and beta diversity of a sample dataset. Below you will find a link to a small test dataset to download and use in this tutorial.

## Files used in tutorial

The following files, derived from the *Moving Pictures tutorial*, are used in this document.

## Alpha Diversity Analysis

The `alpha`

and `alpha-phylogenetic`

methods compute a user-specified alpha diversity metric for all samples in a feature table.

Phylogenetic alpha diversity metrics (in this case, Faithâs Phylogenetic Diversity), can be run with the following command:

```
qiime diversity alpha-phylogenetic \
--i-table table.qza \
--i-phylogeny rooted-tree.qza \
--p-metric faith_pd \
--o-alpha-diversity faith_pd_vector.qza
```

Non-phylogenetic alpha diversity metrics (in this case, Observed OTUs), can be run with the following command:

```
qiime diversity alpha \
--i-table table.qza \
--p-metric observed_otus \
--o-alpha-diversity observed_otus_vector.qza
```

The `--i-table`

input provides the feature table containing the samples for which the alpha diversity metric will be computed. The `--i-phylogeny`

input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the `alpha-phylogenetic`

command (i.e., when computing phylogenetic diversity metrics. The `--p-metric`

parameter specifies the alpha diversity metric to be run. The `--o-alpha-diversity`

output specifies the output file.

To compute a different alpha diversity metric, change the ``--p-metric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available alpha diversity metrics in QIIME 2.

**Abundance-based Coverage Estimator (ACE) metric**: Calculates the ACE metric- Estimates species richness using a correction factor
`--p-metric`

: ace*Chao, A. and Lee, S.M.. (1992). âEstimating the number of classes via sample coverageâ. Journal of the American Statistical Association. (87): 210-217.*

**Berger-Parker Dominance Index**: Calculates Berger-Parker dominance index- Relative richness of the abundant species
`--p-metric`

: berger_parker_d*Berger, W.H. and Parker, F.L. (1970). âDiversity of planktonic Foraminifera in deep sea sedimentsâ. Science. (168): 1345-1347.*

**Brillouinâs index**: Calculates Brillouinâs index- Measures the diversity of the species present
- Use when randomness canât be guaranteed
`--p-metric`

: brillouin_d*Pielou, E. C. (1975). Ecological Diversity. New York, Wiley InterScience.*

**Chao1 confidence interval**: Calculates chao1 confidence interval- Confidence interval for richness estimator, Chao1
`--p-metric`

: chao1_ci*Colwell, R.K., Mao, C.X., Chang, J. (2004). âInterpolating, extrapolating, and comparing incidence-based species accumulation curves.â Ecology. (85), 2717-2727.*

**Chao1 index**: Calculates Chao1 index- Estimates diversity from abundant data
- Estimates number of rare taxa missed from undersampling
`--p-metric`

: chao1- *Chao, A. (1984). âNon-parametric estimation of the number of classes in a populationâ.

**Dominance measure**: Calculates dominance measure**- How equally the taxa are presented
`--p-metric`

: dominance

**Effective Number of Species (ENS)/Probability of intra-or interspecific encounter (PIE) metric**: Calculates Effective Number of Species (ENS)/Probability of intra-or interspecific encounter (PIE) metric- Shows how absolute amount of species, relative abundances of species, and their intraspecific clustering affect differences in biodiversity among communities
`--p-metric`

: enspie*Chase, J.M., and Knight, R. (2013). âScale-dependent effect sizes of ecological drivers on biodiversity: why standardised sampling is not enoughâ. Ecology Letters (16): 17-26.*

**Etsy confidence interval**: Calculates Estyâs confidence interval- Confidence interval for how many singletons in total individuals
`--p-metric`

: etsy_ci*Esty, W. W. (1983). âA normal limit law for a nonparametric estimator of the coverage of a random sampleâ. Ann Statist. (11): 905-912.*

**Faithâs phylogenetic diversity**: Calculates faithâs phylogenetic diversity- Measures of biodiversity that incorporates phylogenetic difference between species
- Sum of length of branches
`--p-metric`

: faith_pd*Faith. D.P. (1992). âConservation evaluation and phylogenetic diversityâ. Biological Conservation. (61) 1-10.*

**Fisherâs index**: Calculates Fisherâs index- Relationship between the number of species and the abundance of each species
`--p-metric`

: fisher_alpha*Fisher, R.A., Corbet, A.S. and Williams, C.B. (1943). âThe relation between the number of species and the number of individuals in a random sample of an animal populationâ. Journal of Animal Ecology. (12): 42-58.*

**Gini index**: Calculates Gini index- Measures species abundance
- Assumes that the sampling is accurate and that additional data would fall on linear gradients between the values of the given data
`--p-metric`

: gini_index*Gini, C. (1912). âVariability and Mutabilityâ. C. Cuppini, Bologna. 156.*

**Goodâs coverage of counts**: Calculates Goodâs coverage of counts.- Estimates the percent of an entire species that is represented in a sample
`--p-metric`

: goods_coverage*Good. I.J (1953) âThe populations frequency of Species and the Estimation of Populations Parametersâ. Biometrika. 40(3/4):237-264*

**Heipâs evenness measure**: Calculates Heipâs evenness measure.- Removes dependency on species number
`--p-metric`

: heip_e*Heip, C. (1974). âA new index measuring evennessâ. J. Mar. Biol. Ass. UK. (54): 555-557.*

**Kempton-Taylor Q index**: Calculates Kempton-Taylor Q index- Measured diversity based off the distributions of species
- Makes abundance curve based off all species and IQR is used to measure diversity
`--p-metric`

: kempton_taylor_q*Kempton, R.A. and Taylor, L.R. (1976). âModels and statistics for species diversityâ. Nature (262): 818-820.*

**Lladserâs confidence interval**: Calculates Lladserâs confidence interval- Single confidence interval of the conditional uncovered probability
`--p-metric`

: lladser_ci*Lladser, M.E., Gouet, R., Reeder, R. (2011). âExtrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknownâ. PLoS.*

**Lladserâs point estimate**: Calculates Lladserâ point estimate- Estimates how much of the environment contains unsampled taxa
- Best estimate on a complete sample
`--p-metric`

: lladser_pe*Lladser, M.E., Gouet, R., Reeder, J. (2011). âExtrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknownâ. PLoS.*

**Margalefâs richness index**: Calculates Margalefâs richness index- Measures species richness in a given area or community
`--p-metric`

: margalef*Magurran, A.E. (2004). âMeasuring biological diversityâ. Blackwell. 76-77.*

**Mcintosh dominance index D**: Calculates McIntosh dominance index D- Affected by the variation in dominant taxa and less affected by the variation in less abundant or rare taxa
`--p-metric`

: msintosh_d*McIntosh, R.P. (1967). âAn index of diversity and the relation of certain concepts to diversityâ. Ecology (48): 392-404.*

**Mcintosh evenness index E**: Calculates McIntoshâs evenness measure E- How evenly abundant taxa are
`--p-metric`

: mcintosh_e*Heip, C. (1974). âA new index measuring evennessâ. J. Mar. Biol. Ass. UK. (54) 555-557.*

**Menhinickâs richness index**: Calculates Menhinickâs richness index- The ratio of the number of taxa to the square root of the sample size
`--p-metric`

: menhinick*Magurran, A.E. (2004). âMeasuring biological diversityâ. Blackwell. 76-77.*

**Michaelis-Menten fit to rarefaction curve of observed OTUs**: Calculates Michaelis-Menten fit to rarefaction curve of observed OTUs.- Estimated richness of species pools
`--p-metric`

: michaelis_mentin_fit*Raaijmakers, J.G.W. (1987). âStatistical analysis of the Michaelis-Menten equationâ. Biometrics. (43): 793-803.*

**Number of distinct features**: Calculates number of distinct OTUs`--p-metric`

: observed_otus*DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). âGreengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARBâ. Applied and Environmental Microbiology (72): 5069â5072.*

**Number of double occurrences**: Calculates number of double occurrence OTUs (doubletons)- OTUs that only occur twice
`--p-metric`

: doubles

**Number of observed features, including singles and doubles**: Calculates number of observed OTUs, singles, and doubles.`--p-metric`

: osd*DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). âGreengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARBâ. Applied and Environmental Microbiology. 72 (7): 5069â5072.*

**Singles**: Calculates number of single occurrence OTUs (singletons)- OTUs that appear only once in a given sample
`--p-metric`

: singles

**Pielouâs evenness**: Calculates Pielouâs eveness- Measure of relative evenness of species richness
`--p-metric`

: pielou_e*Pielou, E. (1966). âThe measurement of diversity in different types of biological collectionsâ. J. Theor. Biol. (13): 131-144.*

**Robbinsâ estimator**: Calculates Robbinsâ estimator- Probability of unobserved outcomes
`--p-metric`

: robbins*Robbins, H.E. (1968). âEstimating the Total Probability of the unobserved outcomes of an experimentâ. Ann Math. Statist. 39(1): 256-257.*

**Shannonâs index**: Calculates Shannonâs index- Calculates richness and diversity using a natural logarithm
- Accounts for both abundance and evenness of the taxa present
`--p-metric`

: shannon*Shannon, C.E. and Weaver, W. (1949). âThe mathematical theory of communicationâ. University of Illonois Press, Champaign, Illonois.*

**Simpson evenness measure E**: Calculates Simpsonâs evenness measure E.- Diversity that account for the number of organisms and number of species
`--p-metric`

: simpson_e*Simpson, E.H. (1949). âMeasurement of Diversityâ. Nature. (163): 688*

**Simpsonâs index**: Calculates Simpsonâs index- Measures the relative abundance of the different species making up the sample richness
`--p-metric`

: simpson*Simpson, E.H. (1949). âMeasurement of diversity". Nature. (163): 688.*

**Strongâs dominance index (Dw)**: Calculates Strongâs dominance index- Measures species abundance unevenness
`--p-metric`

: strong*Strong, W.L. (2002). âAssessing species abundance uneveness within and between plant communitiesâ. Community Ecology (3): 237-246.*

## Beta Diversity Analysis

The `beta`

and `beta-phylogenetic`

methods compute a user-specified beta diversity metric for all samples in a feature table.

Phylogenetic beta diversity metrics (in this case, Unweighted UniFrac), can be run with the following command:

```
qiime diversity beta-phylogenetic \
--i-table table.qza \
--i-phylogeny rooted-tree.qza \
--p-metric unweighted_unifrac \
--o-distance-matrix unweighted_unifrac_distance_matrix.qza
```

Non-phylogenetic beta diversity metrics (in this case, Bray-Curtis), can be run with the following command:

```
qiime diversity beta \
--i-table table.qza \
--p-metric braycurtis \
--o-distance-matrix unweighted_unifrac_distance_matrix.qza
```

The `--i-table`

input provides the feature table containing the samples for which the beta diversity metric will be computed. The `--i-phylogeny`

input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the `beta-phylogenetic`

command (i.e., when computing phylogenetic diversity metrics. The `--p-metric`

parameter specifies the beta diversity metric to be run. The ` --o-distance-matrix`

output specifies the output file.

To compute a different beta diversity metric, change the ``--p-metric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available beta diversity metrics in QIIME 2.

**Bray-Curtis dissimilarity**: Calculates BrayâCurtis dissimilarity- Fraction of overabundant counts
`--p-metric`

: braycurtis*Sorenson, T. (1948) "A method of establishing groups of equal amplitude in plant sociology based on similarity of species content." Kongelige Danske Videnskabernes Selskab 5.1-34: 4-7.*

**Canberra distance**: Calculates Canberra distance- Overabundance on a feature by feature basis
`--p-metric`

: canberra*Lance, Godfrey L.N. and Williams, W.T. (1967). "A general theory of classificatory sorting strategies II. Clustering systems." The computer journal 10 (3):271-277.*

**Chebyshev distance**: Calculates Chebyshev distance- Maximum distance between two samples
`--p-metric`

: chebyshev*Cyrus. D. Cantrell (2000). âModern Mathematical Methods for Physicists and Engineersâ. Cambridge University Press.*

**City-block distance**: Calculates City-block distance- Similar to the Euclidean distance but the effect of a large difference in a single dimension is reduced
`--p-metric`

: cityblock*Paul, E.B. (2006). âManhattan distance". Dictionary of Algorithms and Data Structures*

**Correlation coefficient**: Measures Correlation coefficient- Measure of strength and direction of linear relationship between samples
`--p-metric`

: correlation*Galton, F. (1877). "Typical laws of heredity". Nature. 15 (388): 492â495.*

**Cosine Similarity**: Measures Cosine similarity- Ratio of the amount of common species in a sample to the mean of the two samples
`--p-metric`

: cosine*Ochiai, A. (1957). âZoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighhouring Regions-IIâ. Nippon Suisan Gakkaishi. 22(9): 526-530.*

**Dice measures**: Calculates Dice measure- Statistic used for comparing the similarity of two samples
- Only counts true positives once
`--p-metric`

: dice*Dice, Lee R. (1945). "Measures of the Amount of Ecologic Association Between Species". Ecology. 26 (3): 297â302.*

**Euclidean distance**: Measures Euclidean distance- Species-by-species distance matrix
`--p-metric`

: euclidean*Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951-963.*

**Generalized Unifrac**: Measures Generalized UniFrac- Detects a wider range of biological changes compared to unweighted and weighted UniFrac
`--p-metric`

: generalized_unifrac*Chen, F., Bittinger, K., Charlson, E.S., Hoffmann, C., Lewis, J., Wu, G. D., Collman, R.G., Bushman, R.D., Li,H. (2012). âAssociating microbiome composition with environmental covariates using generalized UniFrac distances.â Bioinformatics. 28 (16): 2106-2113.*

**Hamming distance**: Measures Hamming distance- Minimum number of substitutions required to change one group to the other
`--p-metric`

: hamming*Hamming, R.W. (1950) âError Detecting and Error Connecting Codesâ. The Bell System Technical Journal. (29): 147-160.*

**Jaccard similarity index**: Calculates Jaccard similarity index- Fraction of unique features, regardless of abundance
`--p-metric`

: jaccard*Jaccard, P. (1908). âNouvellesrecherches sur la distribution florale.â Bull. Soc. V and. Sci. Nat., (44):223-270.*

**Kulczynski dissimilarity index**: Measures Kulczynski dissimilarity index- Describes the dissimilarity between two samples
`--p-metric`

: kulsinski*Kulcynski, S. (1927). âDie Pflanzenassoziationen der Pieninen. Bulletin International de lâAcademie Polonaise des Sciences et des Lettresâ. Classe des Sciences Mathematiques et Naturelles. 57-203.*

**Mahalanobis distance**: Calculates Mahalanobis distance- How many standard deviations one sample is away from the mean
- Unitless and scale-invariant
- Takes into account the correlations of the data set
`--p-metric`

: mahalanobis*Citation: Mahalanobis, Chandra, P. (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India. 2 (1): 49â55.*

**Matching components**: Measures Matching components- Compares indices under all possible situations
`--p-metric`

: matching*Janson, S., and Vegelius, J. (1981). âMeasures of ecological associationâ. Oecologia. (49): 371â376.*

**Rogers-tanimoto distance**: Measures Rogers-Tanimoto distance- Allows the possibility of two samples, which are quite different from each other, to both be similar to a third
`--p-metric`

: rogerstanimoto*Tanimoto, T. (1958). "An Elementary Mathematical theory of Classification and Prediction". New York: Internal IBM Technical Report.*

**Russel-Rao coefficient**: Calculates Russell-Rao coefficients- Equal weight is given to matches and non-matches
`--p-metric`

: russelrao*Russell, P.F. and Rao, T.R. (1940). âOn habitat and association of species of anopheline larvae in south-eastern Madrasâ. J. Malaria Inst. India. (3): 153-178.*

**Sokal-Michener coefficient**: Measures Sokal-Michener coefficient- Proportion of matches between samples
`--p-metric`

: sokalmichener*Sokal, R.R. and Michener, C.D. (1958). âA statistical method for evaluating systematic relationshipsâ. Univ. Kans. Sci. Bull. (38) 1409-1438.*

**Sokal-Sneath Index**: Calculates Sokal-Sneath index- Measure of species turnover
`--p-metric`

: sokalsneath*Sokal, R.R. and Sneath, P.H.A. (1963). âPrinciples of Numerical Taxonomyâ. W. H. Freeman, San Francisco, California.*

**Species-by-species Euclidean**: Measures Species-by-species Euclidean- Standardized Euclidean distance between two groups
- Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation
`--p-metric`

: seuclidean*Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951-963.*

**Squared Euclidean**: Measures squared Euclidean distance- Place progressively greater weight on samples that are farther apart
`--p-metric`

: sqeuclidean*Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951-963.*

**Unweighted unifrac**: Measures unweighted UniFrac- Measures the fraction of unique branch length
`--p-metric`

: unweighted_unifrac*Lozupone, C. and Knight, R. (2005). "UniFrac: a new phylogenetic method for comparing microbial communities." Applied and environmental microbiology 71 (12): 8228-8235.*

**Weighted Minkowski metric**: Measures Weighted Minkowski metric- Allows the use of the k-means-type paradigm to cluster large data sets
`--p-metric`

: wminkowski*Chan, Y., Ching, W.K., Ng, M.K., Huang, J.Z. (2004). âAn optimization algorithm for clustering using weighted dissimilarity measuresâ. Pattern Recognition. 37(5): 943-952.*

**Weighted normalized UniFrac**: Measures Weighted normalized UniFrac- Takes into account abundance
- Normalization adjusts for varying root-to-tip distances.
`--p-metric`

: weighted_normalized_unifrac*Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576â85.*

**Weighted unnormalized UniFrac**: Measures Weighted unnormalized UniFrac- Takes into account abundance
- Doesn't correct for unequal sampling effort or different evolutionary rates between taxa
`--p-metric`

: weighted_unifrac*Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576â85.*

**Yule index**: Measures Yule index- Measures biodiversity
- Determined by the diversity of species and the proportions between the abundance of those species.
`--p-metric`

: yule*Fisher, R.A., Corbert, A.S., Williams, C.B. (1943). âThe Relationship Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Populationâ. J. Animal Ecol. (12): 42-58.*

To further analyze the results of your beta and alpha diversities, return to the QIIME 2 âMoving Pictures Tutorialâ tutorial and continue at the âalpha-group-significanceâ command.