Hi @ojholland,

Welcome!

But, also, whoa boy, this is a loaded question because it sounds like a power question, and... power is a complex problem, particularly in microbiome research... (Also one I happen to really like.) As such, I've moved it over into a general discussion topic vs user support, because its a lot more theoretical there.

This sounds a lot like something you might want to tackle with beta diversity. Its not a measure of purely unique ASVs/OTus/whatever, but it is a measure or shared or unshared features. Jaccard would address this question nicely: its the fraction of shared OTUs over the fraction of total OTUs.

You've got a potential problem here: training a dataset to measure this is hard because microbiome data is inherently sparse, particularly in free living organisms. Additionally, it's regulated by sequencing depth and the number of samples. If I've got an observation that shows up in 10% of my samples at a depth of 1/5000 sequences and I sequence 10 samples to 1000 sequences/sample, I may not see the feature... or I might only see it in 1 sample. If I've got 100 that I've sequenced to 1/10000, I may actually see the feature in like, 5 or 10 samples. For this to work, your experimental parameters have to be pretty fixed.

From a calculating statistical power standpoint, Kelly et al developed a method to address statistical power with beta diversity. Your actual milage may vary (it's only really implemented for one metric), and IMO, it tends to under estimate power for real experiments, but if you need a power calculation, that's my recommendation.

Worth noting that this is implemented in R, not QIIME 2, but that qiime2-R is a brilliant package which we are all lucky to have and will get your data over nicely.

If you just want to capture alpha diversity, you can (mostly) model it with a standard power calculation, with a non-parametric penalty. In my experiment, it's asymptomatically normal for unweighted metrics, but best to penalize anyway.

In terms of feature-based power... there's not a great formal power calculation, but just assume that you will commit type II error. Current though is that OTU counts follow a power law distribution, and again, what you see will be a function of depth, sample size, and technique. ANCOM/Phylofactor/PhILR/Gneiss don't really have power calculations because the partitions are non-independent among a whole bunch of other problems, so... But, also just make the assumption based on GWAS studies which require hundreds of thousands of samples to detect SNPs in a at least somewhat common genetic background (although my resident geneticist would be laughing at this oversimplification).

Best,

Justine