Should all samples from exactly the same protocol have similar number of reads (raw data)?

I have an experiment with around 15 samples.
For some reasons, a few of the samples have much higher numbers of reads compared to the rest.
For example, sample 1, 2 and 3 have more than 400,000 reads while the rest only has 100,000-200,000 reads/sample.

I expected sample 1 to have the lowest alpha diversity value (based on what we learned from the literature), but it has one of the highest values, and we’re puzzled by this.

Could it be because of bias from the uneven numbers of reads? Ideally, should all samples from the same lab protocol produce same numbers of reads?

Do you think it’s a good idea to randomly sub-sample down those samples with higher than average numbers of reads (so that they have lower numbers - to make them more similar to the rest)?

Many thanks for your help and valuable insights :slight_smile:

Depending on how you’re analyzing your data, @fgara, you may already be doing this. For example, if you run qiime diversity core-metrics, you pass an integer to --p-sampling-depth that describes the sample depth to which you would like to randomly sample, and which drops samples with insufficient depth. Other plugins do things differently.

Sample normalization has been widely discussed both on this forum, and in the larger bioinformatics community. This topic will give you a brief intro to how QIIME 2 generally handles normalization (i.e. plugins/methods usually implement their own normalization procedures), and many resources to explore if you want to learn more about the topic.

Happy hunting!
Chris :egg:


I’m no wet lab expert, but sequencing depths are expected to vary greatly across samples, at least in the Illumina data I’ve worked with.


Hi Chris,

Thank you so much for your very helpful replies and for the reference link :slight_smile:

Yes, I understand that sequencing depths are expected to vary across samples, but if the samples come from same/similar environments (in this case, skin), wouldn’t you expect them to have similar sequencing depths?

My demultiplexed sequence counts summary looks like the following - do you think it’s okay?

Many thanks once again!

@fgara, here’s a sequencing depth distribution from ~1500 samples across five runs in the same study. All fecal, same equipment, same protocols, kits, subjects in controlled environment. Samples with <1000 features have been dropped (these were mostly controls) Depth ranges from ~1000 to 200,000+ reads per sample. Though it represents more samples, the curve isn’t too different from the distribution you have - a very-roughly-bell-shaped curve followed by some outliers.

I would like to stress that my work is on the software side, and I’m not much of a bioinformatician. There’s probably extensive literature on this, but I have no idea. I also don’t have the lab background to tell you why this might be normal, but for our workflow, these results are not unexpected. I suspect that, as with any complex procedure (sample collection, sample prep, storage, extraction, sequencing, etc) there are many opportunities for small things to impact the sequencing depth of a given sample. Or the diversity. Or even the sequences present.

As you suggested, normalization can help us reduce the impact of some of these issues and make our data more useful. Collecting comprehensive metadata is also critical in trying to identify and correct for biases. We can quantify the degree to which, for instance, one extraction has different characteristics than the other extractions. Or recognize when samples stored in one freezer failed to sequence in the way we expected. Comprehensive metadata allows us to answer not only our experimental questions, but also questions about the validity of our data, where bias might have crept into our process, and whether, for example, all of our samples are actually useful/valid.

Chris :mouse:


Hi @fgara and @ChrisKeefe,

My experience is that even within the same sample type in the same study, there’s still a lot of variation. Especially in marker gene sequencing. It may be related to the amount of biomass collected, if you’re using a sampling device. It may be that some communities did better with extraction. It may be due to the magic of PCR… that some DNA molecules just got more enzymatic collisions than others, it may be how well the DNA did or didn’t bind to the flow cell…there are a lot of variates. I’m used to seeing orders of magnitude variation in the same sample type within a project, so I wouldn’t be too concerned (and Ive worked in this field for several years). I do agree with Chris: this is a place where normalization and the effects of normalization matter, so just keep that in mind!



Hi @ChrisKeefe and @jwdebelius,

Oh wow, thank you so much both for such detailed answers!

Thank you Chris for sharing the sequencing depth distribution of samples from your study.
Yes, I can see similarity between the two curves.
What a relief! Phew! :partying_face:
Thank you also for your advice on comprehensive metadata - it’s great!
I’m trying that now :slight_smile:

And thank you Justine for letting me know that you see orders of magnitude variation in the same sample type within a project. These kinds of information, from both of you who have worked in this field for many years, truly helps!

Thank you both so much!!! :partying_face: