Feature count range is to big, how should I set sampling depth

I'm about to do an analysis for samples from oral cavity and the feature count range from 221-243823.
Best sampling depth I can get is 64390 --> Retained 1,352,190 (41.50%) features in 21 (36.21%) samples at the specifed sampling depth.
But it get rids of more than 50 percent of my samples.
Any suggestion in this situation. I'm thinking I can go ahead with my analysis since I still have enough samples to do it. But wonder if anyone had the same situation before and how we should handle it?
Thanks

Sample ID Feature Count
951865247 243823
651865232 240482
951865250 211114
801865238 206666
976081482 197369
121865169 187831
345081520 182893
501865133 141709
949081526 108487
536318948 103170
278081523 100067
663081502 96115
286081516 82016
321865130 78348
498143963 75130
461865196 74388
151143951 71735
407342731 69808
654143932 65903
673081489 64489
841582737 64390
112143989 59330
698342760 55753
557081495 47263
172081528 40534
191081552 35605
180143931 34704
927342707 32558
628342653 27417
832320885 22211
246143941 21504
178081536 21354
713081572 20765
866342689 17282
211865205 16853
599143915 15868
801081514 13910
299342644 12724
727342746 12092
745342742 11569
791342787 9686
673081518 9048
151865220 6826
793322436 6523
301081553 5708
149143944 5462
123081492 5291
703342723 5271
411081509 4920
985081531 2830
590342660 2419
121081549 2075
935342812 1786
277143936 1686
560143958 1467
617081565 967
980342714 700
704342767 221

Hello @Dawud922,

Welcome back to the forums. :wave:

You have discovered a fundamental trade-off between keeping more samples and keeping more depth. Chris wrote up a great discussion of this trade-off, if you want to take a look:

So, "what does your study need?" :thinking: :nerd_face:

1 Like


I checked the rarefaction curve and didn't see the parallels lines much. Do you think I can still go ahead with my analysis?

I can see some parallel lines in your image!

Those lines level off / have a slope near zero, once increasing the sample size does not reveal more Species. For example, I see some samples plateau around 150 and around 70 Species.

However, some samples never have a chance to plateau because their sequencing depth is so low. These are the samples that would be removed if you choose a rarefaction level they never reach.

How many samples do you retain at 10k, 20k, and 30k reads? Are their cohorts of subtypes of oral samples you are interesting in comparing? Do some of these groups sequence better than others?

Hi Colin
Thanks for you quick response, will dive deep into the samples retained at different sampling depth to see if anything come up special.
Thanks

1 Like