How are diversity measures calculated in QIIME?

LVC · November 14, 2018, 3:25pm

Dear QIIME developers,

I have calculated alpha diversity measures (Shannon, simpson, observed_species, chao1) in QIIME1 and was wondering how they were actually calculated. I will progress to QIIME2 and am assuming the way they are calculated will be the same however, whilst QIIME1 is not being maintained I'd still like to understand how the calculations were made to understand published literature that used QIIME1 to generate their results.

Alpha diversity is the effective number of species and each species is considered to be distinct from all other species. At order q = 0 all species are weighted the same regardless of abundance. As q increases to infinity more weight is put on the abundance with q = infinity only taking into account the most abundant species.

Alpha diversity however, can also be influenced by the definition of a species - what is QIIME defining a species as for inclusion criteria? Considering some levels are unassigned at the class taxonomic level for example but considered to be different from everything else. How is QIIME dealing with unassigned groups/how does it include them?

Secondly, are more closely related species being weighted together in QIIME? eg. considering 2 populations:

Population 1 contains a beetle, a horse, a shark and a duck all are different so the effective number of species at q = 0 (richness) is 4.
Population 2 contains a beetle, a horse, a donkey and a duck. However, a horse and a donkey are closely related and are not considered two be 2 distinct species here so a weighting is placed on the horse and donkey so for this population at q = 0 it might be 3.5 for example.
Is QIIME weighting species here at all when calculating diversity measures?

I would also like to be able to calculate hill numbers at all values of the order q. Is it possible to do this in QIIME2? Currently I convert the Shannon entropy values to Shannon diversity etc but I only have the options for q = 0, q = 1, q = 2 and q = infinity I think.

Finally, in QIIME1 the observed_otus and observed_species data are identical however, when I look at the biom table (converted into a .tsv for visualisation) there appears to be several OTUs for each observed species. Is the a correction on the observed_OTUs calculation?

With many thanks in advance,
Lauren.

LVC · November 15, 2018, 1:11pm

As a continuation from my question above:

When I convert Shannon entropy values calculated in QIIME into Shannon diversity values they are higher than the observed_species values for that sample.
For example:
Shannon entropy = 7.241567
Shannon diversity = EXP (Shannon entropy)
EXP(7.241567) = 1396.28 > Observed Species = 678.5
But Shannon Diversity should always be < or = Species Richness (which = Observed Species here?).
Having read up on the documentation I therefore think Shannon entropy is being calculated differently here and that the conversion should instead be:
Shannon diversity = 2^(Shannon entropy)
= 2^(7.241567)
= 151.33 < Observed Species of 678.5 which is a reasonable answer.

Am I correct in my thinking here?

Documentation:
Shannon entropy: http://scikit-bio.org/docs/latest/generated/generated/skbio.diversity.alpha.shannon.html#skbio.diversity.alpha.shannon

Observed OTUs: http://scikit-bio.org/docs/latest/generated/generated/skbio.diversity.alpha.observed_otus.html#skbio.diversity.alpha.observed_otus

I am unable to find the page for Observed_OTUs but since my OTU count and Observed Species count are identical my guess is Observed Species is the calculation of the number of distinct species observed.

Nicholas_Bokulich · November 15, 2018, 2:26pm

Hi @LVC,

I cannot really comment on QIIME 1 — I was not involved in QIIME 1 development, and you may want to post to the QIIME 1 forum to get QIIME 1-specific answers.

That said, I expect QIIME 1 probably used the same metric calculations that QIIME 2 does.

The "species" definition entirely depends on the input. We call this metric "observed OTUs" in QIIME 2 to be somewhat more clear about this — unless if alpha diversity is being explicitly calculated on species (e.g., a feature table that has been annotated and collapsed with species-level taxonomy), then the "species" definition is going to be unique sequences observed (i.e., OTUs or ASVs, however unique seqs are defined).

This is not considered explicitly in richness (i.e., "observed OTUs") calculations. However, the "Faith's PD" metric examines phylogenetic diversity richness (as branch length covered by a given sample).

We do not have a method for calculating hill numbers in QIIME 2 (this would be a great addition if you would like to contribute!). However, reading this paper:

Parameter a determines special cases of Hill number, for example, N 0 as number of taxa, N 1 as exponential Shannon index, and N 2 as reciprocal Simpson index

So Shannon and Simpson's index can be calculated separately in QIIME 2.

That sounds correct, those would be identical. OTUs are just unique sequences, and do not necessarily correspond to distinct species — they can be different strains or a species or even 16S variants from within the same cell (multi-copy heterogeneity) or sequence error. Alpha diversity gets pretty messy when looking at multi-copy microbial marker genes.

Yes, you are correct. Shannon is often calculated as log base 10 or log base 2 (or base e). See this thread for a little more discussion.

I hope that helps!

LVC · November 15, 2018, 4:34pm

Thanks @Nicholas_Bokulich for your help!

If I work out how to do the hill numbers I will get back to you.

system · December 16, 2018, 10:34pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.