I'm about to do an analysis for samples from oral cavity and the feature count range from 221-243823.
Best sampling depth I can get is 64390 --> Retained 1,352,190 (41.50%) features in 21 (36.21%) samples at the specifed sampling depth.
But it get rids of more than 50 percent of my samples.
Any suggestion in this situation. I'm thinking I can go ahead with my analysis since I still have enough samples to do it. But wonder if anyone had the same situation before and how we should handle it?
Thanks
You have discovered a fundamental trade-off between keeping more samples and keeping more depth. Chris wrote up a great discussion of this trade-off, if you want to take a look:
Those lines level off / have a slope near zero, once increasing the sample size does not reveal more Species. For example, I see some samples plateau around 150 and around 70 Species.
However, some samples never have a chance to plateau because their sequencing depth is so low. These are the samples that would be removed if you choose a rarefaction level they never reach.
How many samples do you retain at 10k, 20k, and 30k reads? Are their cohorts of subtypes of oral samples you are interesting in comparing? Do some of these groups sequence better than others?
Hi Colin
Thanks for you quick response, will dive deep into the samples retained at different sampling depth to see if anything come up special.
Thanks