Hi, I've read through several of the forum posts/tutorials on producing a rarefaction curve, and using this to help determine the minimum sampling depth. While I understand this on a practical level - you need to normalise the data, ideally finding a balance between avoiding under-sampling diversity and not losing too many samples - when comparing alpha rarefaction curves split by the groups you want to test (e.g. the attached image), why is it acceptable to do this before calculating diversity metrics?

As the curve shows estimations of diversity at each depth, and we can see approximately how the groups we're testing behave at each point, why is this not risky in terms of p-hacking/data-peeking? For instance in the attached image, we can see choosing a sampling depth of ~10,000 and ~15,000 substantially changes the difference between the groups.

I've struggled to find any literature on this.

If anyone has any insight I would really appreciate it, and apologies if this is a very basic question!