About the Alpha and Beta Diversity Analysis Tutorial
This Alpha and Beta Diversity Community Tutorial (run using QIIME 2017.12) walks you through analyzing the alpha and beta diversity of a sample dataset. Below you will find a link to a small test dataset to download and use in this tutorial.
Files used in tutorial
The following files, derived from the Moving Pictures tutorial, are used in this document.
Alpha Diversity Analysis
The alpha
and alpha-phylogenetic
methods compute a user-specified alpha diversity metric for all samples in a feature table.
Phylogenetic alpha diversity metrics (in this case, Faithâs Phylogenetic Diversity), can be run with the following command:
qiime diversity alpha-phylogenetic \
--i-table table.qza \
--i-phylogeny rooted-tree.qza \
--p-metric faith_pd \
--o-alpha-diversity faith_pd_vector.qza
Non-phylogenetic alpha diversity metrics (in this case, Observed OTUs), can be run with the following command:
qiime diversity alpha \
--i-table table.qza \
--p-metric observed_otus \
--o-alpha-diversity observed_otus_vector.qza
The --i-table
input provides the feature table containing the samples for which the alpha diversity metric will be computed. The --i-phylogeny
input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the alpha-phylogenetic
command (i.e., when computing phylogenetic diversity metrics. The --p-metric
parameter specifies the alpha diversity metric to be run. The --o-alpha-diversity
output specifies the output file.
To compute a different alpha diversity metric, change the ``--p-metric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available alpha diversity metrics in QIIME 2.
- Abundance-based Coverage Estimator (ACE) metric: Calculates the ACE metric
- Estimates species richness using a correction factor
--p-metric
: ace- Chao, A. and Lee, S.M.. (1992). âEstimating the number of classes via sample coverageâ. Journal of the American Statistical Association. (87): 210-217.
- Berger-Parker Dominance Index: Calculates Berger-Parker dominance index
- Relative richness of the abundant species
--p-metric
: berger_parker_d- Berger, W.H. and Parker, F.L. (1970). âDiversity of planktonic Foraminifera in deep sea sedimentsâ. Science. (168): 1345-1347.
- Brillouinâs index: Calculates Brillouinâs index
- Measures the diversity of the species present
- Use when randomness canât be guaranteed
--p-metric
: brillouin_d- Pielou, E. C. (1975). Ecological Diversity. New York, Wiley InterScience.
- Chao1 confidence interval: Calculates chao1 confidence interval
- Confidence interval for richness estimator, Chao1
--p-metric
: chao1_ci- Colwell, R.K., Mao, C.X., Chang, J. (2004). âInterpolating, extrapolating, and comparing incidence-based species accumulation curves.â Ecology. (85), 2717-2727.
- Chao1 index: Calculates Chao1 index
- Estimates diversity from abundant data
- Estimates number of rare taxa missed from undersampling
--p-metric
: chao1- *Chao, A. (1984). âNon-parametric estimation of the number of classes in a populationâ.
- Dominance measure: Calculates dominance measure**
- How equally the taxa are presented
--p-metric
: dominance
- Effective Number of Species (ENS)/Probability of intra-or interspecific encounter (PIE) metric: Calculates Effective Number of Species (ENS)/Probability of intra-or interspecific encounter (PIE) metric
- Shows how absolute amount of species, relative abundances of species, and their intraspecific clustering affect differences in biodiversity among communities
--p-metric
: enspie- Chase, J.M., and Knight, R. (2013). âScale-dependent effect sizes of ecological drivers on biodiversity: why standardised sampling is not enoughâ. Ecology Letters (16): 17-26.
- Etsy confidence interval: Calculates Estyâs confidence interval
- Confidence interval for how many singletons in total individuals
--p-metric
: etsy_ci- Esty, W. W. (1983). âA normal limit law for a nonparametric estimator of the coverage of a random sampleâ. Ann Statist. (11): 905-912.
- Faithâs phylogenetic diversity: Calculates faithâs phylogenetic diversity
- Measures of biodiversity that incorporates phylogenetic difference between species
- Sum of length of branches
--p-metric
: faith_pd- Faith. D.P. (1992). âConservation evaluation and phylogenetic diversityâ. Biological Conservation. (61) 1-10.
- Fisherâs index: Calculates Fisherâs index
- Relationship between the number of species and the abundance of each species
--p-metric
: fisher_alpha- Fisher, R.A., Corbet, A.S. and Williams, C.B. (1943). âThe relation between the number of species and the number of individuals in a random sample of an animal populationâ. Journal of Animal Ecology. (12): 42-58.
- Gini index: Calculates Gini index
- Measures species abundance
- Assumes that the sampling is accurate and that additional data would fall on linear gradients between the values of the given data
--p-metric
: gini_index- Gini, C. (1912). âVariability and Mutabilityâ. C. Cuppini, Bologna. 156.
- Goodâs coverage of counts: Calculates Goodâs coverage of counts.
- Estimates the percent of an entire species that is represented in a sample
--p-metric
: goods_coverage- Good. I.J (1953) âThe populations frequency of Species and the Estimation of Populations Parametersâ. Biometrika. 40(3/4):237-264
- Heipâs evenness measure: Calculates Heipâs evenness measure.
- Removes dependency on species number
--p-metric
: heip_e- Heip, C. (1974). âA new index measuring evennessâ. J. Mar. Biol. Ass. UK. (54): 555-557.
- Kempton-Taylor Q index: Calculates Kempton-Taylor Q index
- Measured diversity based off the distributions of species
- Makes abundance curve based off all species and IQR is used to measure diversity
--p-metric
: kempton_taylor_q- Kempton, R.A. and Taylor, L.R. (1976). âModels and statistics for species diversityâ. Nature (262): 818-820.
- Lladserâs confidence interval: Calculates Lladserâs confidence interval
- Single confidence interval of the conditional uncovered probability
--p-metric
: lladser_ci- Lladser, M.E., Gouet, R., Reeder, R. (2011). âExtrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknownâ. PLoS.
- Lladserâs point estimate: Calculates Lladserâ point estimate
- Estimates how much of the environment contains unsampled taxa
- Best estimate on a complete sample
--p-metric
: lladser_pe- Lladser, M.E., Gouet, R., Reeder, J. (2011). âExtrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknownâ. PLoS.
- Margalefâs richness index: Calculates Margalefâs richness index
- Measures species richness in a given area or community
--p-metric
: margalef- Magurran, A.E. (2004). âMeasuring biological diversityâ. Blackwell. 76-77.
- Mcintosh dominance index D: Calculates McIntosh dominance index D
- Affected by the variation in dominant taxa and less affected by the variation in less abundant or rare taxa
--p-metric
: msintosh_d- McIntosh, R.P. (1967). âAn index of diversity and the relation of certain concepts to diversityâ. Ecology (48): 392-404.
- Mcintosh evenness index E: Calculates McIntoshâs evenness measure E
- How evenly abundant taxa are
--p-metric
: mcintosh_e- Heip, C. (1974). âA new index measuring evennessâ. J. Mar. Biol. Ass. UK. (54) 555-557.
- Menhinickâs richness index: Calculates Menhinickâs richness index
- The ratio of the number of taxa to the square root of the sample size
--p-metric
: menhinick- Magurran, A.E. (2004). âMeasuring biological diversityâ. Blackwell. 76-77.
- Michaelis-Menten fit to rarefaction curve of observed OTUs: Calculates Michaelis-Menten fit to rarefaction curve of observed OTUs.
- Estimated richness of species pools
--p-metric
: michaelis_mentin_fit- Raaijmakers, J.G.W. (1987). âStatistical analysis of the Michaelis-Menten equationâ. Biometrics. (43): 793-803.
- Number of distinct features: Calculates number of distinct OTUs
--p-metric
: observed_otus- DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). âGreengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARBâ. Applied and Environmental Microbiology (72): 5069â5072.
- Number of double occurrences: Calculates number of double occurrence OTUs (doubletons)
- OTUs that only occur twice
--p-metric
: doubles
- Number of observed features, including singles and doubles: Calculates number of observed OTUs, singles, and doubles.
--p-metric
: osd- DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). âGreengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARBâ. Applied and Environmental Microbiology. 72 (7): 5069â5072.
- Singles: Calculates number of single occurrence OTUs (singletons)
- OTUs that appear only once in a given sample
--p-metric
: singles
- Pielouâs evenness: Calculates Pielouâs eveness
- Measure of relative evenness of species richness
--p-metric
: pielou_e- Pielou, E. (1966). âThe measurement of diversity in different types of biological collectionsâ. J. Theor. Biol. (13): 131-144.
- Robbinsâ estimator: Calculates Robbinsâ estimator
- Probability of unobserved outcomes
--p-metric
: robbins- Robbins, H.E. (1968). âEstimating the Total Probability of the unobserved outcomes of an experimentâ. Ann Math. Statist. 39(1): 256-257.
- Shannonâs index: Calculates Shannonâs index
- Calculates richness and diversity using a natural logarithm
- Accounts for both abundance and evenness of the taxa present
--p-metric
: shannon- Shannon, C.E. and Weaver, W. (1949). âThe mathematical theory of communicationâ. University of Illonois Press, Champaign, Illonois.
- Simpson evenness measure E: Calculates Simpsonâs evenness measure E.
- Diversity that account for the number of organisms and number of species
--p-metric
: simpson_e- Simpson, E.H. (1949). âMeasurement of Diversityâ. Nature. (163): 688
- Simpsonâs index: Calculates Simpsonâs index
- Measures the relative abundance of the different species making up the sample richness
--p-metric
: simpson- Simpson, E.H. (1949). âMeasurement of diversity". Nature. (163): 688.
- Strongâs dominance index (Dw): Calculates Strongâs dominance index
- Measures species abundance unevenness
--p-metric
: strong- Strong, W.L. (2002). âAssessing species abundance uneveness within and between plant communitiesâ. Community Ecology (3): 237-246.
Beta Diversity Analysis
The beta
and beta-phylogenetic
methods compute a user-specified beta diversity metric for all samples in a feature table.
Phylogenetic beta diversity metrics (in this case, Unweighted UniFrac), can be run with the following command:
qiime diversity beta-phylogenetic \
--i-table table.qza \
--i-phylogeny rooted-tree.qza \
--p-metric unweighted_unifrac \
--o-distance-matrix unweighted_unifrac_distance_matrix.qza
Non-phylogenetic beta diversity metrics (in this case, Bray-Curtis), can be run with the following command:
qiime diversity beta \
--i-table table.qza \
--p-metric braycurtis \
--o-distance-matrix unweighted_unifrac_distance_matrix.qza
The --i-table
input provides the feature table containing the samples for which the beta diversity metric will be computed. The --i-phylogeny
input provides the phylogenetic tree containing the tip identifiers that correspond to the feature identifiers in the table, and is only used for the beta-phylogenetic
command (i.e., when computing phylogenetic diversity metrics. The --p-metric
parameter specifies the beta diversity metric to be run. The --o-distance-matrix
output specifies the output file.
To compute a different beta diversity metric, change the ``--p-metric` parameter to the one that corresponds to the metric you want to compute. The following list provides information on the available beta diversity metrics in QIIME 2.
- Bray-Curtis dissimilarity: Calculates BrayâCurtis dissimilarity
- Fraction of overabundant counts
--p-metric
: braycurtis- Sorenson, T. (1948) "A method of establishing groups of equal amplitude in plant sociology based on similarity of species content." Kongelige Danske Videnskabernes Selskab 5.1-34: 4-7.
- Canberra distance: Calculates Canberra distance
- Overabundance on a feature by feature basis
--p-metric
: canberra- Lance, Godfrey L.N. and Williams, W.T. (1967). "A general theory of classificatory sorting strategies II. Clustering systems." The computer journal 10 (3):271-277.
- Chebyshev distance: Calculates Chebyshev distance
- Maximum distance between two samples
--p-metric
: chebyshev- Cyrus. D. Cantrell (2000). âModern Mathematical Methods for Physicists and Engineersâ. Cambridge University Press.
- City-block distance: Calculates City-block distance
- Similar to the Euclidean distance but the effect of a large difference in a single dimension is reduced
--p-metric
: cityblock- Paul, E.B. (2006). âManhattan distance". Dictionary of Algorithms and Data Structures
- Correlation coefficient: Measures Correlation coefficient
- Measure of strength and direction of linear relationship between samples
--p-metric
: correlation- Galton, F. (1877). "Typical laws of heredity". Nature. 15 (388): 492â495.
- Cosine Similarity: Measures Cosine similarity
- Ratio of the amount of common species in a sample to the mean of the two samples
--p-metric
: cosine- Ochiai, A. (1957). âZoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighhouring Regions-IIâ. Nippon Suisan Gakkaishi. 22(9): 526-530.
- Dice measures: Calculates Dice measure
- Statistic used for comparing the similarity of two samples
- Only counts true positives once
--p-metric
: dice- Dice, Lee R. (1945). "Measures of the Amount of Ecologic Association Between Species". Ecology. 26 (3): 297â302.
- Euclidean distance: Measures Euclidean distance
- Species-by-species distance matrix
--p-metric
: euclidean- Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951-963.
- Generalized Unifrac: Measures Generalized UniFrac
- Detects a wider range of biological changes compared to unweighted and weighted UniFrac
--p-metric
: generalized_unifrac- Chen, F., Bittinger, K., Charlson, E.S., Hoffmann, C., Lewis, J., Wu, G. D., Collman, R.G., Bushman, R.D., Li,H. (2012). âAssociating microbiome composition with environmental covariates using generalized UniFrac distances.â Bioinformatics. 28 (16): 2106-2113.
- Hamming distance: Measures Hamming distance
- Minimum number of substitutions required to change one group to the other
--p-metric
: hamming- Hamming, R.W. (1950) âError Detecting and Error Connecting Codesâ. The Bell System Technical Journal. (29): 147-160.
- Jaccard similarity index: Calculates Jaccard similarity index
- Fraction of unique features, regardless of abundance
--p-metric
: jaccard- Jaccard, P. (1908). âNouvellesrecherches sur la distribution florale.â Bull. Soc. V and. Sci. Nat., (44):223-270.
- Kulczynski dissimilarity index: Measures Kulczynski dissimilarity index
- Describes the dissimilarity between two samples
--p-metric
: kulsinski- Kulcynski, S. (1927). âDie Pflanzenassoziationen der Pieninen. Bulletin International de lâAcademie Polonaise des Sciences et des Lettresâ. Classe des Sciences Mathematiques et Naturelles. 57-203.
- Mahalanobis distance: Calculates Mahalanobis distance
- How many standard deviations one sample is away from the mean
- Unitless and scale-invariant
- Takes into account the correlations of the data set
--p-metric
: mahalanobis- Citation: Mahalanobis, Chandra, P. (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India. 2 (1): 49â55.
- Matching components: Measures Matching components
- Compares indices under all possible situations
--p-metric
: matching- Janson, S., and Vegelius, J. (1981). âMeasures of ecological associationâ. Oecologia. (49): 371â376.
- Rogers-tanimoto distance: Measures Rogers-Tanimoto distance
- Allows the possibility of two samples, which are quite different from each other, to both be similar to a third
--p-metric
: rogerstanimoto- Tanimoto, T. (1958). "An Elementary Mathematical theory of Classification and Prediction". New York: Internal IBM Technical Report.
- Russel-Rao coefficient: Calculates Russell-Rao coefficients
- Equal weight is given to matches and non-matches
--p-metric
: russelrao- Russell, P.F. and Rao, T.R. (1940). âOn habitat and association of species of anopheline larvae in south-eastern Madrasâ. J. Malaria Inst. India. (3): 153-178.
- Sokal-Michener coefficient: Measures Sokal-Michener coefficient
- Proportion of matches between samples
--p-metric
: sokalmichener- Sokal, R.R. and Michener, C.D. (1958). âA statistical method for evaluating systematic relationshipsâ. Univ. Kans. Sci. Bull. (38) 1409-1438.
- Sokal-Sneath Index: Calculates Sokal-Sneath index
- Measure of species turnover
--p-metric
: sokalsneath- Sokal, R.R. and Sneath, P.H.A. (1963). âPrinciples of Numerical Taxonomyâ. W. H. Freeman, San Francisco, California.
- Species-by-species Euclidean: Measures Species-by-species Euclidean
- Standardized Euclidean distance between two groups
- Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation
--p-metric
: seuclidean- Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951-963.
- Squared Euclidean: Measures squared Euclidean distance
- Place progressively greater weight on samples that are farther apart
--p-metric
: sqeuclidean- Legendre, P. and Caceres, M. (2013). âBeta diversity as the variance of community data: dissimilarity coefficients and partitioning.â Ecology Letters. 16(8): 951-963.
- Unweighted unifrac: Measures unweighted UniFrac
- Measures the fraction of unique branch length
--p-metric
: unweighted_unifrac- Lozupone, C. and Knight, R. (2005). "UniFrac: a new phylogenetic method for comparing microbial communities." Applied and environmental microbiology 71 (12): 8228-8235.
- Weighted Minkowski metric: Measures Weighted Minkowski metric
- Allows the use of the k-means-type paradigm to cluster large data sets
--p-metric
: wminkowski- Chan, Y., Ching, W.K., Ng, M.K., Huang, J.Z. (2004). âAn optimization algorithm for clustering using weighted dissimilarity measuresâ. Pattern Recognition. 37(5): 943-952.
- Weighted normalized UniFrac: Measures Weighted normalized UniFrac
- Takes into account abundance
- Normalization adjusts for varying root-to-tip distances.
--p-metric
: weighted_normalized_unifrac- Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576â85.
- Weighted unnormalized UniFrac: Measures Weighted unnormalized UniFrac
- Takes into account abundance
- Doesn't correct for unequal sampling effort or different evolutionary rates between taxa
--p-metric
: weighted_unifrac- Lozupone, C. A., Hamady, M., Kelley, S. T., Knight, R. (2007). "Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities". Applied and Environmental Microbiology. 73(5): 1576â85.
- Yule index: Measures Yule index
- Measures biodiversity
- Determined by the diversity of species and the proportions between the abundance of those species.
--p-metric
: yule- Fisher, R.A., Corbert, A.S., Williams, C.B. (1943). âThe Relationship Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Populationâ. J. Animal Ecol. (12): 42-58.
To further analyze the results of your beta and alpha diversities, return to the QIIME 2 âMoving Pictures Tutorialâ tutorial and continue at the âalpha-group-significanceâ command.