The effect of duplicate and triplicate samples in 16S downstream analysis


I would like to get input concerning handling of duplicate/triplicate samples in microbiome data analysis. So I had a total of 56 samples of which I decided to sequence some of them (their DNA) in duplicate and triplicate (19 samples in duplicate and 4 samples in triplicate).

The idea was just to check for reproducibility of the sequencing. However, I am in a bit of a dilemma whether to do my final analysis including the duplicate and triplicate samples or whether I should just choose a representative from each of the duplicate/triplicate set. This is what I think I should do, but then the question becomes which is the best way to chose which of the duplicate/triplicate samples to include in the final analysis and how to justify this to my reviewers?

I was thinking of comparing their profiles (just from taxonomy bar plots) and if the samples are similar, then I choose the one with the highest number of reads. But I am not sure if this is actually the best way to go about it. (Especially since there were quite huge differences in number of reads for some duplicate samples e.g 13000 and 9000 reads for a duplicate sample).

And if I choose to include all the samples, what would be the effect on my results? (since microbiome analysis is sensitive to sample size).

Can anyone kindly advice me accordingly. I would also appreciate links to any articles addressing this issue.

Thank you.


Hi @Flutomia,

It sounds like you were trying to address two questions with this: whether the sequencing was reproducible, and some biological question about your replicates.

Because your extraction replicates should be more similar to each other than they are to any other sample, including them biases your over all community view toward the replicate sample. Your microbiome analysis is sample size dependent (particularly in terms of feature-based analyses). But, you’re not actually gaining sample size by including the technical replicates.

As to comparing and selecting replicates, Id suggest looking at your beta diversity. I would rarefy your data, and run a PCoA to make sure your replicates cluster in PCoA space more strongly than anything else. (You can also visualize this is your turn your diversity matrix into a dendrogram). I would use taxonomic metrics like Bray Curtis and Jaccard distance over phylogenetic metrics in this case, because you really want to know if the features themselves are the same, rather than if the feature are closely related. Bray Curtis is a weighted metric: it will give you more information about abundant organisms. Jaccard is unweighted, and therefore considers both abundant and rare organisms equally. The tutorial on paired and longitudinal sampling might help with statitics.

Then, if your communities are similar, you could choose the most abundant. You could also chose at random over a rarefaction depth. (I would not chose at random if your replicates do not meet your rarefaction threshold.) Both should be relatively easy to implement. (Although if you’re concerned about bias, random selection may be better.)

To alleviate concerns about bias in your sample selection, perhaps run a sensitivity analysis where you check and see if swapping out the duplicate or triplicate samples changes your results. If I were reviewing your paper, seeing there weren’t major differences would alleviate any potential concern about bias in sample selection.



Hello @jwdebelius

Thank you very much for your response. This is definitely an eye opener and it makes a lot of sense.

Really appreciate this.

Thank you!

This is of course true when clustering using OTUs, but I was under the impression that this was not the case for ASV/ESV based techniques such as deblur, dada2 and potentially vsearch if you choose not to cluster?

(Don’t worry @Flutomia this doesn’t effect the advice given, which is sound.)

If your downstream analysis uses relative abundance you could combine the replicates together, once proven they are statistically similar.

1 Like

The expected similarity of technical replicates isn’t a microbiome specific phenomenon, and should be true regardless what you’re measuring. So, if I’m looking at a population who is prediabetic and measure a subset of people’s HbA1c multiple times off the same blood sample, (1) I expect that those A1c values should be similar to each other within my margin of variation, and (2) that if I include those replicates multiple times in my data, it will create a bias.

The common denosiing/clustering technique shouldn’t have baring on the reproducibility of the measurement, assuming the algorithms are working as expected. If true sequence A is present in the baseline community, we’d expect to see true sequence A in Rep 1 and Rep2, regardless of the method. Denoising will actually give you better sequence level specificity and allow you to check if the set of sequences are identical, after error correction.

However, the between sample relationship might be less true in biological replicates. In biological replicates (depending on your defination of biological replicate), you may or may not expect to have true sequence A in R1 and R2 at the same abundance. In that case, if you’ve seen an evolution from organism with true sequence A to an organism with true Sequence B, and the difference is 1 bp, then OTUs will group these, but denoising won’t.


I think this is a great discussion and wanted to throw in a couple of comments and requests. All the advice given by @jwdebelius is how I would approach this problem as well, however, I’ve never actually come across any benchmarking of this issue and would love it if someone has a ref in mind. The most relevant paper with regards to this issue I’ve found is this paper, but unfortunately they do their analysis and make their recommendations based on OTUs. Nevertheless I think they make some excellent points, especially regarding the presence of singletons and the choice of beta diversity metrics. It’d be nice if their analysis was repeated using ASVs.
The issue with these situations is not that when the replicates are very closely similar to each other but rather what to do when they are not. For example if you have technical duplicates with enough within sample variability, how do you decide if you a) choose one over the other b) drop both samples or c) merge them. In qiime2 there are different methods of merging samples that could be used on technical replicates using the feature-table group but again I’m not sure how to systematically choose one of these methods. Things get ever murkier if your replicates suffer from batch effect, then you further need to ask if one replicate is more reliable than the other and if and how you should normalize these samples before merging them.
I have no answers, and my own search into the literature hasn’t been very fruitful. Though I imagine this isn’t just an amplicon sequencing issue, genomics folks probably have been dealing with this for a while as well, how is it dealt with there? Would love to hear others’ thoughts on the matter.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.