Can anyone suggest the most appropriate diversity metrics for very low library sizes <1000 reads


I’m working with low biomass samples, which following quality and contaminant filtering contain considerably fewer than 1000 reads, some as low as 100. The lower samples are made up of fewer than 10 representative taxa. Its therefore likely that I will be rarefying my samples to ~200 reads.

Previously I’ve utilised the shannon and simpson alpha diversity metrics, however I understand these aren’t robust below 1000 reads. Would anyone be able to suggest appropriate alternatives based on the sample characteristics?

Any help would be appreciated!

Best wishes,

Note that at that level any kind of diversity analysis is going to be somewhat limited.

Just out of interest, do you have any references for that?

I do not have a good answer for this, but at such a low level of rarefaction I expect that most alpha diversity metrics are going to suffer from the same sorts of limitations. I'd recommend dropping some samples for the sake of increasing sequence depth if you can afford it...

@Mehrbod_Estaki do you have any thought on this?

1 Like

I agree with @Nicholas_Bokulich completely, unfortunately it’s hard to rely on any metrics of ~ 100 sequences community, especially if they are being compared to other similar samples with much higher sequences, I fear artificial distance being introduced between them.
Are these samples known communities by chance, as in a mock community with expected ~10 taxa as you described? If they are, then maybe you could rely on some presence/absence metrics but even then…only if your back against the wall. I’ll echo @Nicholas_Bokulich, drop those samples to increase depth and if you must, you can perhaps discuss those discarded samples in a more exploratory manner.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.