Are there difference rarefaction cut-off and rarefying in the micorbiome analysis ?

Hello !

Now I have some confusion about the concept and processes of rarefaction-cutoff and rarefying in detail.

I have concept that rarefaction cut-off typically refers to a method in which samples with low read counts are excluded from an analysis. That being said, simple exclusion of samples without any more randomization for further analysis.

However, I found some explanation that the term "rarefaction" refers to the process of randomly subsampling a fixed number of reads from each sample to standardize the number of reads across samples in some other context, which is very similar to concept of normalization in my understanding.

Are the words used for generally different meaning or rarefaction-cutoff includes more steps of randomization after excluding certain samples that are below threshold readcounts ? If then, the main difference of normalization and rarefaction is the first step that excluding some samples?

Please help me by comparing the normalization, rarefying and rarefaction cutoff ...

Thank you angels in advance.

2 Likes

Hi @SingeunOh,
I'll describe the way I have in mind. First of all, to perform any diversity analysis you need to apply a normalisation to your dataset. There are many normalisation methods, among which normalisation by rarefaction is one of the oldest and still very much used.

The process of "rarefaction" is exactly the one you describe above. In order to apply this method, you need to decide a rarefaction threshold (or as you call it a rarefaction cutt-off), to subsampling this fixed number of sequences from any of your sample. At the end, any samples with sequence count higher than the threshold will be included into the final output with number of sequence exactly as your fixed threshold, any samples with less than that amount will be discarded. The sequences in each sample are chosen randomly, so if you perform this rarefaction step many times you could get slightly different results; also depending on the implementation of the subsampling process, a sequence could be put back in the pool or be excluded from the following draw.

So, "rarefaying" is the normalisation method that you perform by applying a "rarefaction cutoff" to all of your samples.

Hope it helps.
Luca

1 Like

Thank you dear Luca. Now I understand that rarefaction is one kind of normalization method and it includes random subsampling process. It helps me a lot more about the normalization method.

May I ask you one more question ? If the subsamplling is done with some samples, how many sample is chosen for one iteration usually ? Is there any usual sample number for one iteration ? Besides, how many times are sampled for the final calculation ? Could you explain me about the detailed process of rarefaction ?

  • If anybody knows this well, please help me. Thanks.
1 Like

Hi @SingeunOh,

The normalization works because eventually all the normalised samples will have the same number of sequences hence they are meant to be comparable.
The subsampling is performed on any samples with total read count equal or higher than the selected threshold.
On how many times the sequences were subsampled for the final number, it will depend on the implementation, in the qiime2 diversity pipeline there is an option to specify the number of times you would like the subsampling to be performed.
Hope it helps
Luca

1 Like

Thank you. I got much of help. I almost get the concept from you. I appreciate all of your advice.

1 Like

Hi @SingeunOh ,
as final suggestion,if you like to have more information on normalisation methods and their limitations, I suggest to look at:
Lin H, Peddada SD (2020). Analysis of microbial compositions: a review of normalization and differential abundance analysis. npj Biofilms and Microbiomes 6 (1): 60.
Hope it helps
Luca

2 Likes

Thank you for your sincere help. I would definitely read the review article. ^^