Why is using rarefaction curves to help determine sampling depth prior to producing core diversity metrics not considered p-hacking/data-peeking?

Hi, I've read through several of the forum posts/tutorials on producing a rarefaction curve, and using this to help determine the minimum sampling depth. While I understand this on a practical level - you need to normalise the data, ideally finding a balance between avoiding under-sampling diversity and not losing too many samples - when comparing alpha rarefaction curves split by the groups you want to test (e.g. the attached image), why is it acceptable to do this before calculating diversity metrics?

As the curve shows estimations of diversity at each depth, and we can see approximately how the groups we're testing behave at each point, why is this not risky in terms of p-hacking/data-peeking? For instance in the attached image, we can see choosing a sampling depth of ~10,000 and ~15,000 substantially changes the difference between the groups.

I've struggled to find any literature on this.

If anyone has any insight I would really appreciate it, and apologies if this is a very basic question!

Hello,
Both methods, alpha ratefaction and core-metrics sampling depth will randomly select reads from original samples. "The best" sampling depth from ratefaction curve not necessarily produce the most significant difference between groups since in both cases different and randomly selected pools of features will be analyzed.
Reruning core-metrics with exactly the same depth will produce slightly different results. It is why I am always skeptical about p-values like, for example, 0.04 and 0.06.

1 Like

Hi @DrL, I realize this is quite an old post at this point, but I just came across it again and thought I'd weigh in.

I agree - alpha rarefaction plots do open the door to potential p-hacking. Like always, it's up to the user to use the statistics and visualizations appropriately. The purpose of alpha rarefaction plots, like this, isn't to see where the differences are the greatest and to select that as the even sampling depth, but to get a feel for whether a conclusion (e.g., that "dark blue" is higher than "light blue" in this plot) is stable across a range of sampling depths. One way to avoid the perception of p-hacking is to include a plot like this as supplementary material with a paper. In that case, providing a plot like this for transparency should create confidence in the observation among your readers.

I hope this helps!

3 Likes

Hi @gregcaporaso , thanks very much for your input, it's very helpful!

1 Like