Difference between sub-sampling and rarefing.

TurboQiimer · January 25, 2021, 3:12pm

Hi,
I want to know differences between sub-sampling and rarefying functionalities.
I already thought they do the same job and also they are synonyms of each other, but I just know they have different command. Maybe they are different each other. It made me to ask this question. I appreciate you give me a short explanation in this regard.

https://docs.qiime2.org/2020.11/plugins/available/feature-table/subsample/
https://docs.qiime2.org/2020.11/plugins/available/feature-table/rarefy/

Qiimer

jwdebelius · January 25, 2021, 5:29pm

Hi @TurboQiimer,

Subsampling selects all the counts in features or columns. You can subsample up to the number of features/samples in your table. It's useful if you need a random test set/


0	5	40
25	25	30
25	0	20
25	40	10
25	30	0

In my table, I have 3 samples (, , ) each with 100 counts.

So, if I subsample my example feature table to 3 features, I can get


0	5	40
25	25	30
25	40	10

Where now, has 50 counts, has 70 counts, and has 80 counts.

In constrast, I could rarefy the data to 10 counts. This subsamples each sample up to the sampling depth. The subsampling is done independently of other samples and I can mix features. So, for example...


0	0	4
5	2	3
3	0	3
2	4	0
0	4	0

At the end of rarefaction, I have all 5 of my original features, but I only have 10 counts for each feature.

Best,
Justine

TurboQiimer · January 26, 2021, 2:44pm

I really liked your figurative explanation. Thanks a lot, indeed.
Can we say both sub-sampling and rarefying are sort of normalization?
Additionally, as you illustrated above, sub-sampling remove some samples randomly in contrast to rarefying pulling values to a depth sampling. When typically is subsampling used? Does it make an adverse effect on analysis result?
Appreciate
Qiimer

jwdebelius · January 26, 2021, 4:02pm

Hi @TurboQiimer,

Rarefation is normalization, subsampling is subsampling your data. They're not the same thing. I might use subsampling in a simulation, teaching, or i I need a representative subset. I don't typically look for representative subsets until I get closer to 1000+ samples, and typically here only for samples. I would not recommend it for most analyses without consideration of what the goal is.

Best,
Justine

system · February 26, 2021, 10:02pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.