correlation between sequencing depth and richness, despite normal looking rarefaciton curves

Hi all,

I noticed a while ago that in some of my own research, but also in a colleagues who is a dateset that is completely independent of mine (different organisms, different sequencing service, etc.) that there is a strong correlations between sequencing depth and ASV richness. Looking at the rarefaction curves, they all plateau out nicely and all samples have more than enough read coverage. Still, samples with higher read counts without fail have higher species richness, and lower coverage samples have lower. The correlations is definitely significantly strong.

Any ideas what could cause this?



But isn’t it how it supposed to be?
On the rarefaction plot, you have two axes, x for sequencing depth and y for number of observed features. You also can choose a metadata column in a visualization to group samples. When you do so, it is showing averages for the group. It can be misleading since in fact each sample inside one group may differ in the number of reads, and in the richness as well. You can see it by choosing another column in your metadata file with more resolution or individual sample ID.
So, even if all samples in a certain group are reaching a plato more or less at the same depth, each sample is doing it with its own number of observed features, meanwhile we are averaging this numbers by grouping samples according to metadata file.

Hope I correctly understood your question.


Hi Timur,

Thanks for your answer. I don’t think it should be like this. Number of reads sequenced should depend on input DNA but not actual diversity of the library. So I find it suspicious that lets say a sample with 100,000 reads has twice the richness as a sample at 50,000 reads. Sure, looking at just two samples this could just be random chance, but we plotted # reads vs richness and there is a very strong positive correlation between the two, even though all the samples show a plateau.


Hi @Roger_Huerlimann ,

It might help to discuss if we are all looking at the same plots. Would you mind sharing your plots? (rarefaction curves, also if you have plotted alpha diversity per sample rarefied at even depth vs. total sequencing depth for each sample)


Sorry for the delayed answer. I had some further discussions with my colleague and we might have an explanation, but it’s still nice to see what other people think.

Here are the rarefaction curves.

And here read depth vs rarefied richness

Hi @Roger_Huerlimann,

I'm going to jump in, if that's okay

I’ve observed this relatively frequently with a fair number of pure richness metrics in complex environments. In those situations, it can be hard to determine if we’re dealing with sequencing error tha escaped denoising or just a technical variation. To some degree, it makes sense that if you have a more diverse pool pre rarefaction, you can have a more diverse result post rarefaction.

I sometimes penalize this in an OLS for richness which is usually asymptotically normal anyway. (There’s one in q2-longitudinal). I find the log richnesss may be a good model for this. At the same time, it looks like much of your data plateaus at about 10-20K sequences/sample, and so perhaps its worth just considering that.