Hi,
I want to know differences between sub-sampling and rarefying functionalities.
I already thought they do the same job and also they are synonyms of each other, but I just know they have different command. Maybe they are different each other. It made me to ask this question. I appreciate you give me a short explanation in this regard.
Subsampling selects all the counts in features or columns. You can subsample up to the number of features/samples in your table. It's useful if you need a random test set/
feature-id
0
5
40
25
25
30
25
0
20
25
40
10
25
30
0
In my table, I have 3 samples (, , ) each with 100 counts.
So, if I subsample my example feature table to 3 features, I can get
feature-id
0
5
40
25
25
30
25
40
10
Where now, has 50 counts, has 70 counts, and has 80 counts.
In constrast, I could rarefy the data to 10 counts. This subsamples each sample up to the sampling depth. The subsampling is done independently of other samples and I can mix features. So, for example...
feature-id
0
0
4
5
2
3
3
0
3
2
4
0
0
4
0
At the end of rarefaction, I have all 5 of my original features, but I only have 10 counts for each feature.
I really liked your figurative explanation. Thanks a lot, indeed.
Can we say both sub-sampling and rarefying are sort of normalization?
Additionally, as you illustrated above, sub-sampling remove some samples randomly in contrast to rarefying pulling values to a depth sampling. When typically is subsampling used? Does it make an adverse effect on analysis result?
Appreciate
Qiimer
Rarefation is normalization, subsampling is subsampling your data. They're not the same thing. I might use subsampling in a simulation, teaching, or i I need a representative subset. I don't typically look for representative subsets until I get closer to 1000+ samples, and typically here only for samples. I would not recommend it for most analyses without consideration of what the goal is.