That results is very strange! But I think I have an explanation!

While all four alpha diversity metrics are different, the stat test you are running on them is the same: the Kruskal-Wallis test by ranks, a one-way ANOVA on ranks.

Also, the ranks you observe between groups is the same: All the samples in the YF group are always higher than all the samples in the AF group.

Because the Kruskal-Wallis test only cares about ranks, and all the ranks are the same, your test statistics are always identical.

If some of these alpha diversity metrics overlapped, then the ranked test would be different.

Just a side note here: think of a t-test as simple form of ANOVA with only 2 groups, though technically a t-test is comparing means while ANOVA compares variance between groups. You would expect to get nearly exact results between the two, contingent on the assumptions of the tests (normal distribution and unequal variance etc) being met. In short, using an ANOVA on 2 groups is totally fine, and in fact an ANOVA is more robust to violation of normal distribution (but not unequal variance).

That being said, I would make the argument that running any of these tests on n=3, even if technically doable, should not be used as they are essentially meaningless. You can’t predict a distribution from such low n value. Not to say that the data is not useful! But rather that the test is not giving you any useful information. I think in this case simply displaying the data in a jitter box plot will be sufficient.