Removing batch effects

We performed a longitudinal microbiome study in mice that required 3 separate MiSeq runs to cover all of the samples. We mixed the samples so we would have biological replicates present on each plate. The first two runs had high quality reads with the third being a bit lower in quality. I went through the standard demultiplexing, DADA2, and then merging.

When I get to the downstream PCoA analysis, all of the samples on the third plate end up grouping together. We believe this is due to a batch artifact. The biological replicates from the first two plate localize close together as we would expect.

I have tried both total frequency filtering and contingency-based filtering, but neither has help.

Any suggestions?

1 Like

Hi @Franky,
Thanks for posting! The issue of run effects is all too common, unfortunately — it was very good of you to have replicates across each run to help identify this issue in your data.

Do you by any chance have mock communities, positive or negative controls on these runs?

@benjjneb recently published a preprint describing decontam, an R package for identifying contaminants based on negative controls. This method is not yet in QIIME 2 (we are planning to add it to q2-quality-control in a future release) but if you have negative controls on these runs you could use the R package directly and that might help reduce batch effects.

I have not yet read the following, but this preprint sounds like a promising solution.

Other than that I do not know of a good methods for controlling for batch effects, since this is a bit of a thorny issue. If you find one, please let me know — we can consider adding it to QIIME 2!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.