Alpha diversity metrics for unequal sample size groups

When submitting a manuscript recently I was criticised by a reviewer for using Chao1 and observed OTUs due to uneven sample group sizes, and was recommended to reanalyse with Shannon, which I did. This was fungal ITS data so I was using non-phylogenetic measures.

Recently somebody with very different sample sizes used Shannon and Chao1, but only showed the Chao1 as it was significant and Shannon wasn’t. I suggested this may be related to sample size and they should address this, but was told that they couldn’t see a problem with Shannon or Chao as they are simply metrics of diversity that can be applied to any population at any phylogenetic level.

Is chao1 more liable to bias due to sample size differences, and Shannon better? If yes, could somebody please point me in the direction of relevant literature.



Hi @Cat ,

This is a great question and I’m sure people will have some different opinions, but here’s a quick version of mine.

First and most importantly is that these are different measures of alpha diversity, so therefore there is no reason to assume one should be significant if the other is also significant. What you choose should be based on your biological question.
The Chao1 is a species richness estimator which unlike Species Richness, tries to estimate unobserved taxa by assuming a Poisson distribution of the data. The estimates are quite sensitive to how your data has been processed (including rarefying depth) and how you’ve filtered rare taxa (think singletons/doubletons). I never quite understood why it is used with typical microbiome data because with typical workflows that include rarefying and removing rare species, you essentially first remove those rare taxa, and then with Chao1 you try to re-estimate them back in?
There’s also other assumptions like that all the species have an equal chance of being observed, which also doesn’t usually hold in most microbiome surveys.
Shannon diversity on the other hand is an index, that takes into account the abundance of the species, weighting in rare species, but doesnt try to estimate unobserved diversity. It is a lot more stable than Chao1 when it comes to rarefying depth and you can usually see a plateau in rarefaction curves with very few sequences. This makes it - in my opinion- more resistant to processing bias and thus a bit more reliable.

tldr; they answer different questions, Shannon is rather robust while Chao1 (in my opinion) doesn’t make a whole lot of sense with microbiome pipelines.

If you want to estimate unobserved richness, instead of Chao1, I would recommend using Amy Willis’ breakaway package for which there is a q2-plugin. Also her group has done some pretty awesome stuff on the topic and I recommend reading some of her work for example here, and here.

Hope this helps.


Hi @Cat ,
I think I may have read your questions too fast and read you were asking about different sampling depth rather than your actual question clearly asked about uneven sample sizes (across groups) (Thanks @jwdebelius for the heads up!).
So here just to say that uneven group sample sizes won’t have an effect on the observed metric itself but it may affect the reliability of the statistical test you apply to them. I find that group variance is usually higher with Chao1 (and Richness) than Shannon, and so if you have a small sample size already that can really reduce your power to detect a real biological significance. But as with any statistical test you do, you should be checking your model’s assumptions fit rather than relying on generalizability statements like mine. I prefer checking mine visually with diagnostic plots.


Hi @Cat,

I agree with @Mehrbod_Estaki that your ability to handle uneven sample sizes depends less on your metric and more on your statistical test (although I often find my test and my metric are related since some metrics are more normal than others.) My rule of thumb tends to be about 10% subgroup. It’s not a great rule of thumb, but its whats worked for me in the past.

…But, if we’re talking about metrics and depth, I like richness metrics. (My preference is observed features, because I tend to work with methods that don’t leave singletons in my data or dont have meaningful singletons.) Ive worked on multiple studies where the issue is richness and the stochastic loss of organisms is the signal. Shannon is definitely more robust to depth issues because it down weights those lower abundance organisms, and tends to saturate pretty quickly. But… because it down-weights those organisms it can make for a smaller effect size. (I published a recent paper where shannon had half the explanatory power of observed features.)

My solution recently has been to use multivariate regression on my richness metrics and adjust for sequencing depth or log sequencing depth. They tend to be normal or close enough to fudge it. (Thanks central limit theorem!) It’s not always perfect, particularly if I have outliers, but it can help decrease some of those depth-related effects without actually requiring me to figure out how to propagate error across multivariate tests. (I took calc and did propagation of error analysis in college, but it’s also not something I want to do again every week.)

As far as literature on sequencing depth and where metrics emphasize, you might look for hill numbers. I dont have a paper off the top of my head, but that sort of more formally addresses what Bod and I are trying to say while waving our hands.



@jwdebelius @Mehrbod_Estaki - Thank you both for your insights. I have lots of information to process for my own future bioinformatics pipelines!

So can I assume that different sized sample groups on their own are not a reason to discard Chao1 as an alpha diversity metric, but there are plenty of other reasons why it may not be the best choice of metric.