Difference between sub-sampling and rarefing.

Hi,
I want to know differences between sub-sampling and rarefying functionalities.
I already thought they do the same job and also they are synonyms of each other, but I just know they have different command. Maybe they are different each other. It made me to ask this question. I appreciate you give me a short explanation in this regard.

https://docs.qiime2.org/2020.11/plugins/available/feature-table/subsample/
https://docs.qiime2.org/2020.11/plugins/available/feature-table/rarefy/

Qiimer

Hi @TurboQiimer,

Subsampling selects all the counts in features or columns. You can subsample up to the number of features/samples in your table. It’s useful if you need a random test set/

feature-id :whale: :baby_chick: :t_rex:
:computer: 0 5 40
:email: 25 25 30
:pager: 25 0 20
:fax: 25 40 10
:iphone: 25 30 0

In my table, I have 3 samples (:whale:, :baby_chick:, :t_rex:) each with 100 counts.

So, if I subsample my example feature table to 3 features, I can get

feature-id :whale: :baby_chick: :t_rex:
:computer: 0 5 40
:email: 25 25 30
:fax: 25 40 10

Where now, :whale: has 50 counts, :baby_chick: has 70 counts, and :t_rex: has 80 counts.

In constrast, I could rarefy the data to 10 counts. This subsamples each sample up to the sampling depth. The subsampling is done independently of other samples and I can mix features. So, for example…

feature-id :whale: :baby_chick: :t_rex:
:computer: 0 0 4
:email: 5 2 3
:pager: 3 0 3
:fax: 2 4 0
:iphone: 0 4 0

At the end of rarefaction, I have all 5 of my original features, but I only have 10 counts for each feature.

Best,
Justine

3 Likes

I really liked your figurative explanation. Thanks a lot, indeed.
Can we say both sub-sampling and rarefying are sort of normalization?
Additionally, as you illustrated above, sub-sampling remove some samples randomly in contrast to rarefying pulling values to a depth sampling. When typically is subsampling used? Does it make an adverse effect on analysis result?
Appreciate
Qiimer

Hi @TurboQiimer,

Rarefation is normalization, subsampling is subsampling your data. They’re not the same thing. I might use subsampling in a simulation, teaching, or i I need a representative subset. I don’t typically look for representative subsets until I get closer to 1000+ samples, and typically here only for samples. I would not recommend it for most analyses without consideration of what the goal is.

Best,
Justine

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.