Sample size calculation for microbiome studies

Hello everyone,

I am trying to calculate the sample size for a microbiome study (in humans) and I was wondering what kind of tools have you used.
I’ve found R packages like HMP or micropower for sample size and power calculation.

Any help or suggestion will be greatly appreciated :slight_smile:


Hi @Miriam_Aguilar_Lopez,

As far as I know, those are the published models/packages. Neither is ideal: micropower functions best if you assume you will use their metric (and you probably won’t) and HMP assumes a dirilecht model which isn’t popular anymore in analysis. IMO, the micropower metric problem tends to underestimate power because of the differences in assumption around the metric.
But, if you need a number because you need a number, those are the way to go. If, on the other hand, you can get out of being told you must provide a number and instead suggest its an exploratory study or something similar, then you may be better off there.

There are two additional aspects to consider here. First, humans as free living organisms likely do have inheriently sparse microbiomes: you and I dont share all our taxa and that’s okay! Second, based on that, assume that you’re dealing with something more complicated than a standard GWAS study and now consider the study size required for identifying reproducible loci in GWAS. Its not impossible, but its also much larger than a standard microbiome study. (Meaning your feature based analysis will likely be underpowered and its okay. But also that meta analysis can be painful. And its not always obvious what’s happening.)

So, I think your best bet will be to focus on beta diversity and try to be realistic about what you think your possible difference in effect sizes might be and compare to other fairly large disease studies, maybe looking at something like the Flemish Gut, American Gut, or the recent Chinese paper comparing region and disease effect.

I would make the assumption that you’re going to need at least 50-100 samples per group, even for a pilot, plus allow yourself a 5-10% drop out and failure rate for amplification. Which isn’t actually power calculation advice, but after about 5 years of human gut case-control studies, its my rule of thumb. I have rarely regretted analyzing a dataset with at least 50 samples per group (at least not for statistical reasons) but I regularly curse myself, my collaborators, the world, monte carlo simulations, and whatever supernatural power happens to have caught my fancy and be passing through when I work with fewer.



Hello @jwdebelius,

This is a great explanation :nerd_face:!
I will definitely consider all this for the study proposal.

Thank your for replying!


Some of the ways I justified sample size have been:

  1. Changes to lower airway microbiome via microbiota challenges and the percentage relative abundance changes predicted by Day #x. For example, I predict my WT group will have x% strep and I anticipate with my challenge it will have 15% vs. 0.1% in n=5 mice.
  2. Could also be immune changes, we run lung homogenate flow on mice, so we can estimate markers of inflammation with changes of CD4+ and CD8+ activation. due to “dysbiosis”.

Yeah, I mean, if you can justify with something else it’s usually more concrete. It’s just that you don’t always get what you want. I think the first is complicated because it makes a lot of assumptions. AFAIK, the ANCOM developers haven’t figure out a power calculation here either. But, pretty much every power calculation statement Ive seen Ive trusted has been either “this was a pilot and then we went back and tried subsampling” (essentially a post hoc power calculation but like, not the “we didn’t get the expected significance” post-hoc calculation), a statement that they don’t know what the heck they’re doing and so they’re trying their best, or a calculation they justified another way?