I have a project to begin that will include samples across 10 sequencing runs. Each run contains a mock sample. are there any suggestions on how to normalize across all of the runs so that the data can be combined for one analysis?
There are not, to my knowledge, any methods for normalizing batch effects between sequence runs for microbiome data (but please let me know if you know of any!). The best you can do is:
randomize samples across runs so run is not a cofounder for other variables (you are probably already doing this)
Use mock communities on each run to assess run-to-run variation (you are already doing this. awesome! )
Since you are using mock communities, I would recommend using q2-quality-control to compare expected vs. observed mock community accuracy for each sequencing run. Mock communities never look perfect, so don’t worry that you aren’t getting perfect accuracy — but this might help you pick apart batch effects.
The other thing to do is merge all runs after denoising and keep batch information in your sample metadata file — you can use this to compare alpha and beta diversity between batches to make sure there are not any strong batch effects.
Batch effect metadata values can also be incorporated as random effects in some methods in q2-longitudinal (and maybe also q2-gneiss), allowing you to account for this variation in mixed models. Something to keep in mind if those plugins are appropriate for your data.
There have been a few other forum posts discussing this topic, that might be helpful in this regard: