Feature count range is to big, how should I set sampling depth

Dawud922 · November 21, 2021, 4:02pm

I'm about to do an analysis for samples from oral cavity and the feature count range from 221-243823.
Best sampling depth I can get is 64390 --> Retained 1,352,190 (41.50%) features in 21 (36.21%) samples at the specifed sampling depth.
But it get rids of more than 50 percent of my samples.
Any suggestion in this situation. I'm thinking I can go ahead with my analysis since I still have enough samples to do it. But wonder if anyone had the same situation before and how we should handle it?
Thanks

Sample ID	Feature Count
951865247	243823
651865232	240482
951865250	211114
801865238	206666
976081482	197369
121865169	187831
345081520	182893
501865133	141709
949081526	108487
536318948	103170
278081523	100067
663081502	96115
286081516	82016
321865130	78348
498143963	75130
461865196	74388
151143951	71735
407342731	69808
654143932	65903
673081489	64489
841582737	64390
112143989	59330
698342760	55753
557081495	47263
172081528	40534
191081552	35605
180143931	34704
927342707	32558
628342653	27417
832320885	22211
246143941	21504
178081536	21354
713081572	20765
866342689	17282
211865205	16853
599143915	15868
801081514	13910
299342644	12724
727342746	12092
745342742	11569
791342787	9686
673081518	9048
151865220	6826
793322436	6523
301081553	5708
149143944	5462
123081492	5291
703342723	5271
411081509	4920
985081531	2830
590342660	2419
121081549	2075
935342812	1786
277143936	1686
560143958	1467
617081565	967
980342714	700
704342767	221

colinbrislawn · November 21, 2021, 4:30pm

Hello @Dawud922,

Welcome back to the forums.

You have discovered a fundamental trade-off between keeping more samples and keeping more depth. Chris wrote up a great discussion of this trade-off, if you want to take a look:

So, "what does your study need?"

Dawud922 · November 22, 2021, 1:57am

I checked the rarefaction curve and didn't see the parallels lines much. Do you think I can still go ahead with my analysis?

colinbrislawn · November 22, 2021, 7:14pm

I can see some parallel lines in your image!

Those lines level off / have a slope near zero, once increasing the sample size does not reveal more Species. For example, I see some samples plateau around 150 and around 70 Species.

However, some samples never have a chance to plateau because their sequencing depth is so low. These are the samples that would be removed if you choose a rarefaction level they never reach.

How many samples do you retain at 10k, 20k, and 30k reads? Are their cohorts of subtypes of oral samples you are interesting in comparing? Do some of these groups sequence better than others?

Dawud922 · November 23, 2021, 3:31am

Hi Colin
Thanks for you quick response, will dive deep into the samples retained at different sampling depth to see if anything come up special.
Thanks

system · December 24, 2021, 9:32am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.