How to decide --p-sampling-depth value?

I am in a trouble to decide the --p-sampling-depth value. My table info is the following:

Sample ID Sequence Count
Neg2 16,589
Tre1 15,633
Tre2 15,606
Neg1 15,177
Neg3 14,844
Tre3 14,605
CK2 14,528
CK1 14,501
CK3 13,306

So, in this case which sampling depth will be used? Looking forward to your help!

And overview is provided in the following:
Table summary
Metric Sample
Number of samples 9
Number of features 1,475
Total frequency 134,789

Frequency per sample
Minimum frequency 13,306.0
1st quartile 14,528.0
Median frequency 14,844.0
3rd quartile 15,606.0
Maximum frequency 16,589.0
Mean frequency 14,976.555555555555

Frequency per feature
Minimum frequency 2.0
1st quartile 20.0
Median frequency 42.0
3rd quartile 95.5
Maximum frequency 4,537.0
Mean frequency 91.38237288135593

1 Like

Wow, you’re lucky to have such high sequence counts! It is somewhat arbitrary what sampling depth you choose, but typically you want to choose a value high enough that you capture the diversity present in samples with high counts, but low enough that you don’t get rid of a ton of your samples. Given the the minimum count for any of your samples is 13,306 (which is pretty high, and close to the maximum count in your samples), you could probably just choose 13,306 as your sampling depth.


Thanks! I will try again!

Hi @hzh0005,
Your sequence counts are indeed very high, so you probably can follow @Emma_Dietrich’s advice and choose 13,306 as your sampling depth (thank you @Emma_Dietrich for your answer!)

To give you a little more insight on how we usually choose a good sampling depth (particularly if we have lower sequencing counts), you can check out this tutorial. Alpha rarefaction plots indicate how sampling depth impacts alpha diversity (which will be tangentially related to impacts on beta diversity and other downstream analyses so is a good rough benchmark to use here). We are ideally looking for a sequencing depth at the point where these rarefaction curves begin to level off (indicating that most of the relevant diversity has been captured). This helps inform tough decisions that we need to make when some samples have lower sequence counts and we need to balance the priorities that @Emma_Dietrich summed up perfectly:

So give alpha rarefaction a try here — you will probably not need to use it, but it will be a good way to learn how to use this method to select sampling depths for future data sets.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.