I would like to get input concerning handling of duplicate/triplicate samples in microbiome data analysis. So I had a total of 56 samples of which I decided to sequence some of them (their DNA) in duplicate and triplicate (19 samples in duplicate and 4 samples in triplicate).
The idea was just to check for reproducibility of the sequencing. However, I am in a bit of a dilemma whether to do my final analysis including the duplicate and triplicate samples or whether I should just choose a representative from each of the duplicate/triplicate set. This is what I think I should do, but then the question becomes which is the best way to chose which of the duplicate/triplicate samples to include in the final analysis and how to justify this to my reviewers?
I was thinking of comparing their profiles (just from taxonomy bar plots) and if the samples are similar, then I choose the one with the highest number of reads. But I am not sure if this is actually the best way to go about it. (Especially since there were quite huge differences in number of reads for some duplicate samples e.g 13000 and 9000 reads for a duplicate sample).
And if I choose to include all the samples, what would be the effect on my results? (since microbiome analysis is sensitive to sample size).
Can anyone kindly advice me accordingly. I would also appreciate links to any articles addressing this issue.