Rarefaction depth


I have a quick question on deciding on a rarefaction depth. My samples vary a lot at the low end. I am trying to decide on a sampling depth, but I am having a difficult time deciding. Any advice is welcomed



I have a quick question. I am trying to decide on a sampling depth for my data but its hard for me to decide given how some values are at 16, then 100s (about 12 samples) before it hits 1000.


I have attached my EcM-table-NS.qzv (632.5 KB) in case someone has some good advice for me.

Hello Fabiola,

This is a tough question; you must chose between throwing away samples so you have more reads and better resolution, or keeping more samples that are subsampled to a lower depth. Because so many of your samples failed, I would consider resequencing this. Was your biomass very low? Was something hurting the PCR amplification?

I'm glad you attached your qzv file. Using https://view.qiime2.org/, I opened up this filed, went to the 'Interactive Sample Detail' tab, and tried playing with different Sampling depths. This let's you see what samples you would throw away at a given sampling depth.

For example, this graph shows me that at a rarefaction depth of 2528, I would lose most of my 'burn' treatment, but keep most of my 'unburn' treatment. Very interesting! If I care about burn vs unburn, maybe I should choose a rarefaction depth below 2528.

Try this for yourself. What do you think you should do?



Oh, I had not noticed I could choose a variable in my metadata. Perfect. I do think the burnt units are extremely low, as I am looking at high-severity burnt areas, so it is expected. Which makes it extremely hard for me to choose a proper sampling depth, due to the high variability.

I will mess with it and hopefully come up with a “fair” value.

Quick question: does the percent sequences retained and % samples retain matter, or should I say, should I keep them above a certain value, like 90% or is it all based on my data?

Hello Fabiola,

I think you are on the right track!

It’s all based on your data and keeping enough samples so you have statistical power when you go to do a stat test. This can be hard when you have low biomass samples with very few reads…

Let me know if this answers your question,


1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.