Rarefaction vs Rarefying and when to use them in your analysis pipeline

zippyzo · June 4, 2025, 7:19pm

Dear qiime2 community,

I entered the microbiome analysis world this year and just came to the understanding that Rarefaction and Rarefying are NOT the same thing, so I have been rarefying my data when I thought that I was performing rarefaction (insert scream of horror ). With this realization in hand I have attempted to understand what the differences are and when to apply them... I would greatly appreciate input if you have a moment!

To summarize my understanding so far:

Rarefaction = repeated subsampling to a specific sequencing depth that computes a single mean value for each sample.
Rarefying = a single subsampling to a specific sequencing depth that computes a value for each ASV for each sample.

My understanding is that there has been a lot of confusion in the field surrounding both these terms (conflating the two.. #relatable ), but the current consensus seems to be that Rarefaction is a valid way (and perhaps even the best way) to correct for differences in sampling depth when calculating diversity metrics, as the initial paper that suggested against Rarefaction actually used Rarefying instead (though there were other issues too..).

My questions are as follows:

Although Rarefaction is a validated way of correcting for sampling depth, Rarefying is not. Is this fair to say?
Rarefaction can be used for the calculation of beta-diversity and alpha-diversity metrics, but it isn't possible for calculations of differential abundance. Is this not then a problem because we are not correcting for sampling depth in differential abundance analyses? Would this not then give support for performing Rarefying as one could do so, thus correct for sampling depth, and then use it for both diversity metrics and differential abundance calculations?
Because Rarefaction controls for sampling depth, and CLR transformation controls for the compositionality of data, would it not be fair to apply Rarefaction followed by CLR transformation?

Thank you so much in advance for helping me understand these concepts!

kindest regards,
Zoë

A few papers that I have really appreciated on the topic:
https://journals.asm.org/doi/10.1128/msphere.00355-23
https://journals.asm.org/doi/10.1128/msphere.00354-23

colinbrislawn · June 4, 2025, 8:20pm

This is a great post!

My first thought is that the words are too dang similar!

Instead, we could call these 'single subsampling' and 'multiple subsampling', which makes the commonality (computational subsampling) and difference (one time, many times) clear.

'Multiple subsampling' is also called 'bootstrapping', and there is a new Qiime2 plugin that does this!

gregcaporaso · June 9, 2025, 4:46pm

Hi @zippyzo, Thanks for your post!

Although Rarefaction is a validated way of correcting for sampling depth, Rarefying is not. Is this fair to say?

Rarefaction is a more reliable approach to performing alpha and beta diversity analysis. Rarefying has been widely used, and tends to produce very similar results to rarefaction*, but rarefaction is preferable.

@colinbrislawn mentioned q2-boots (see the paper here), which is what I would recommend using. This is illustrated in the gut-to-soil (g2s) tutorial here - in this case using the kmer-diversity action, though core-metrics is also a good one to use (docs on core-metrics, with examples, are here).

it isn't possible for calculations of differential abundance

The methods for normalization tend to be specific to differential abundance testing method. Those statistics are complex (I'm not an expert), so I go with what the statisticans who develop them recommend.

Because Rarefaction controls for sampling depth, and CLR transformation controls for the compositionality of data, would it not be fair to apply Rarefaction followed by CLR transformation?

This seems reasonable to me, but I'm interested in what others think as well (@jwdebelius - any input on this?).

Right now, it's possible to create n rarefied features tables with the resample action in boots. We don't have a corresponding action to average these together to create a single table, but if there is a need for that it would be easy enough to add.

I hope this helps!

Notes:
*One analysis that hasn't been done yet, but which I think would be interesting, would be to assess how often a given rarefied diversity metric represents an outlier from the distribution of all the diversity metrics that are computed during rarefaction. I think that would help us assess how problematic rarefying tends to be - but regardless, I still believe that rarefaction is preferable. This analysis would be straight-forward to pull off with q2-boots - happy to advise if anyone wants to try it!

jwdebelius · June 9, 2025, 6:30pm

Hi @zippyzo and @gregcaporaso

I wouldn't recommend rarefaction + CLR. The issue I end up having with rarefaction here is that it introduces artifcial zeros. (We have true 0s from absences, 0s from things that are below the limit of detection, systematic zeros, and now rarefaction zeros.) Since a lot of bias in CLR comes from those zeros, introducing more zeros would not be my preferred appraoch,even if its done through repeated subsampling.

I tend to prefer to apply a filter to my taxa to select things which meet certain abundance and prevalence criteria so that I can be confident that they're detected consistently when I test them. My person critieria are typical 1/rarefaction depth in at least 10% of the samples. This is based on the idea that I in theory want to be able to detect at least 1 read (if present) in any sample, so if my shallowest sample is slightly deeper than that depth, I should - theoretically - be able to detect it. I think you could also argue that you need to go half the rarefaction depth (at least 2 reads detected in the shallowest sample, although for the life of me, I cant recall the name of this principle.) I use a 10% threshhold because I will sometimes use the same critieria for other transforms, like a presence/absence model, and in those cases, my RR models dont work well for super rare taxon.

The filtering function is implemented in filter-features-conditionally

You can do this with q2-feature-table merge using the average as your overlap method. Im not sure how float values are handled there, but as a first order solution, you can already do it?

Best,
Justine

zippyzo · June 10, 2025, 10:51am

Thank you so much @jwdebelius, @gregcaporaso, @colinbrislawn for your insight and discussion! I greatly appreciate you taking the time to reply to my queries, it truly means a lot.

@jwdebelius, I find your filtering strategy very interesting and had a few follow up questions if you have a moment!

So, to make sure I'm understanding, your strategy combines both abundance and prevalence filtering within the same metric?, eg. if you had 10,000 as your rarefaction depth you would use an abundance filter of 1/10,000 = 0.001 -- requiring an abundance of at least 0.001 in at least 10% (prevalence filtering) of your samples? Put another way, you are removing features that have an abundance of 0-0.001 in over 90% of samples.
At what stage do you apply this filtering? Before or after Rarefaction?

I really like this idea @colinbrislawn !

@gregcaporaso, I think I'm confused, as I thought that for Rarefaction to work it needs to provide an average value to allow for Alpha and Beta analyses on the data? Is there a different strategy that the bootstrapping plugin uses? Or is it that the alpha and beta analyses can handle the n rarefied feature tables ie. for each metric they compute a value for each Rarefaction table individually, and then for the output it averages them all providing a final value for statistical analyses?

Thank you so much!
Zoë

gregcaporaso · June 10, 2025, 1:21pm

Yes, that's exactly it. We compute n subsampled tables, compute the diversity metrics on each table, and then average the results. Here's an illustration of the workflow for beta diversity calculations (source):

jwdebelius · June 10, 2025, 4:10pm

Hi @zippyzo,

I'm going to re-iterate: I do not recommend rarefaction or any kind of random feature subsampling with compositional data transforms. There's fairly good literature about why this isn't recommended, but most of it comes down to the fact that subsampling is going to create zeros. And, zeros are bad for log ratio transforms.

The filtering approach serves a parallel function. Instead of trying to normalize away differences in sequencing depth and sampling stochasticity, we define a threshhold for which we are confident we observe a feature and can test it. It's a statement around statistical power, but also confidence in the data.

So, I would not apply a conditional filter in combinatination with any subsampling in 95% of circumstances. There are a few outliers, which are specific applications, but in general, I do either-or.

If I am going to filter my data before rarefaction, for example, if I want to exclude mitochondria or the spike-in my colleague forgot to tell me she stuck in, then I'd do that first and subsample the remainder. That way, I still get an even depth (my goal with rarefaction).

Best,
Justine

zippyzo · June 10, 2025, 4:34pm

Thank you so much for clarifying @gregcaporaso ! I'm very excited to give q2-boots a go and look into it more thoroughly.

Ohhhh @jwdebelius -- apologies for the confusion, thank you so much for re-iterating that for me! I can clearly understand now that you were talking about rarefaction OR filtering + log ratio transforms. I really appreciate you clarifying that for me.

kindest regards,
Zoë

gregcaporaso · June 10, 2025, 4:40pm

@zippyzo - one thing I forgot to mention: q2-boots isn't install by default in the amplicon distribution at the moment. You should follow the installation instructions on the top-right of this page.

zippyzo · June 10, 2025, 5:59pm

@gregcaporaso - good to know, thanks for the heads up!

kindest regards,
Zoë

arwqiime · July 9, 2025, 8:04am

Hello @jwdebelius and @gregcaporaso for your great comments to @zippyzo and putting this into a context with q2-boots.
I was following the q2-feature-table merge with average methods as suggested to create a single table from the resampled tables of q2-boots.

May I include @cmartino in my follow-up question that came up from this comment:

I am using q2-gemelli in beta analyses. Do these concerns of introducing more zero values into the average table also apply to the rpca method of q2-gemelli or is the matrix completion step here a good way to cope with (more) zeros?

Best,

jwdebelius · July 9, 2025, 1:43pm

Hi @arwqiime,

rPCA is a CLR-based ordination. You want to use your filtered feature table, not the rarefied one.

Best,
Justine