correlation between sequencing depth and richness, despite normal looking rarefaciton curves

Roger_Huerlimann · June 16, 2021, 4:46am

Hi all,

I noticed a while ago that in some of my own research, but also in a colleagues who is a dateset that is completely independent of mine (different organisms, different sequencing service, etc.) that there is a strong correlations between sequencing depth and ASV richness. Looking at the rarefaction curves, they all plateau out nicely and all samples have more than enough read coverage. Still, samples with higher read counts without fail have higher species richness, and lower coverage samples have lower. The correlations is definitely significantly strong.

Any ideas what could cause this?

Thanks!
Roger

timanix · June 16, 2021, 7:11am

Hello!

But isn't it how it supposed to be?
On the rarefaction plot, you have two axes, x for sequencing depth and y for number of observed features. You also can choose a metadata column in a visualization to group samples. When you do so, it is showing averages for the group. It can be misleading since in fact each sample inside one group may differ in the number of reads, and in the richness as well. You can see it by choosing another column in your metadata file with more resolution or individual sample ID.
So, even if all samples in a certain group are reaching a plato more or less at the same depth, each sample is doing it with its own number of observed features, meanwhile we are averaging this numbers by grouping samples according to metadata file.

Hope I correctly understood your question.

Best,
Timur

Roger_Huerlimann · June 16, 2021, 7:47am

Hi Timur,

Thanks for your answer. I don't think it should be like this. Number of reads sequenced should depend on input DNA but not actual diversity of the library. So I find it suspicious that lets say a sample with 100,000 reads has twice the richness as a sample at 50,000 reads. Sure, looking at just two samples this could just be random chance, but we plotted # reads vs richness and there is a very strong positive correlation between the two, even though all the samples show a plateau.

Regards,
Roger

Nicholas_Bokulich · June 16, 2021, 8:40am

Hi @Roger_Huerlimann ,

It might help to discuss if we are all looking at the same plots. Would you mind sharing your plots? (rarefaction curves, also if you have plotted alpha diversity per sample rarefied at even depth vs. total sequencing depth for each sample)

Roger_Huerlimann · June 18, 2021, 4:02am

Sorry for the delayed answer. I had some further discussions with my colleague and we might have an explanation, but it's still nice to see what other people think.

Here are the rarefaction curves.
rarefaction_curves

And here read depth vs rarefied richness
readdepthxrarifiedrichness

jwdebelius · June 24, 2021, 5:53pm

Hi @Roger_Huerlimann,

I'm going to jump in, if that's okay

I’ve observed this relatively frequently with a fair number of pure richness metrics in complex environments. In those situations, it can be hard to determine if we’re dealing with sequencing error tha escaped denoising or just a technical variation. To some degree, it makes sense that if you have a more diverse pool pre rarefaction, you can have a more diverse result post rarefaction.

I sometimes penalize this in an OLS for richness which is usually asymptotically normal anyway. (There’s one in q2-longitudinal). I find the log richnesss may be a good model for this. At the same time, it looks like much of your data plateaus at about 10-20K sequences/sample, and so perhaps its worth just considering that.

Best,
Justine

Ernest_Osburn · May 11, 2023, 5:48pm

Hi!

Sorry to revive this topic, but I'm having a similar issue and I'm wondering if there is an explanation or if others have seen this too. In my past couple of sequencing runs I've seen a very strong correlation between ASV richness and total sequencing depth. This hasn't always been the case, it's only popped up in my runs recently. I think it must be some kind of technical issue - for some reason, the higher diversity samples are performing better on the sequencer. Any thoughts? Here is a plot from my latest run: