This is a complex question , and I dont think there’s a clear answer. I mean, I can give you a lower limit at 5 per group for the kruskal wallis based test, because kruskal wallis as a test starts to fail below that sample number, but I can also tell you that even at that size, you’re unlikely to see a W-value over zero.
In general, the sample szie you need depends on two major factors: the effect size for a set of individual feature and the number of features you’re testing.
I’m not sure there’s a good measurement for the relative effect size of difference factors. But, essentially, your effect size has to be big enough to create a p-value low enough that it remains significant after FDR correction. If your sample size is too small for your effect size, you may actually see W=0 for many or all of your features because there isn’t anything significant enough to reach the significance threshhold after correction.
That correction is the place hte second factor comes in. The more features you have, the more tests you conduct, the more extreme the difference has to be to be detectable over the noise. If you filter your data to remove low abundance/prevalence features, you increase your power for remaining features. I tend to be in the camp that anything present in one person is noise; I can’t do statistical tests on it. Where to filter beyond that is debatable; I tend to use an empirical threshold based on what has worked for the type of tests I want to do. In general, though, the rule here is that if you have fewer features, you can detect smaller effect sizes in the features you test. So, you’re still playing with an underpowered analyses, potentially missing interesting/important things because they’re low abundance and/or low prevalence.
One semi controversial way to check whether or not you have enough of a global effect for testing is to look at the beta diversity. If you see a significant effect in beta diversity, particularly a compositional metric like Aitchinson, you’re more likely to see an effect with a smaller sample size. This isn’t fool proof, it’s still prone to type II error, but my experience has been that it’s a good indicator of whether or not it’s possible to get a signal in ANCOM. (Incidentally, I have lots of cases where I can find a signal in beta diversity for a smallish sample size, but my data set is too small to pick out individual features.) I find ANCOM is conservative, so you may also want to look at other differential abundance techniques, like ALDEX, which may be better at smaller sample sizes.