Hi, I have (unfortunately) yet another question on common practices or SOPs for processing data/samples spread over multiple sequencing runs.
I've found plenty of information on the forum already, thanks to the moderators for answering all those questions! To name some threads: Multiple sequencing runs -- normalizing methods? - #3 by Nicholas_Bokulich
To summarize (based on my understanding of all the questions and answers) - it seems the general recommendation when going for denoising with DADA2 is still to keep runs separate and merge after denoising, since different runs might have different error profiles. If Deblur is used instead, runs can be processed together without issues. Though in certain cases, if the samples were run on exactly the same machine with the same library prep method, keeping the runs separate might not be necessary, and one might even benefit from setting the denoise pooling method to --p-pooling-method pseudo
for combined runs.
@colinbrislawn I was wondering if this is still the case and what denoising method do you use based on your recommended strategy, because I was intent on following your suggestion in Best way to merge or group runs/samples - #2 by colinbrislawn
Let’s zoom out a bit. You have lots of samples, and you want to group them in different ways. The major choice here is to either:
- process them all in one batch, then make meaningful categories using feature-table group or
- process them in many batches (one for each meaningful category), then feature-table merge when you want to see the full study.
I’m a big fan of option one. It’s much easier to group and subset from your whole data set, than to merge and recombine all the fragments of your study. But qiime lets you do it both ways
Should one stick with Deblur for unified batches of multiple sequencing runs or can DADA2 also be used? I also included the sequencing run info in the metadata so that potential batch effects can still be identified after denoising and before merging biological samples with feature-table group
. But since the same error profile was used for all of them together, would such differences still be detectable?
The reason why I'm inclined to process my runs together is that I have (negative) DNA extraction and PCR controls spread over several plates/runs. Could including more/all of them during the denoising (as opposed to maybe 2-3 per run) help with chimera detection as these samples will have more chimeric sequences (I suppose)?
In my case, there are 2 full runs (my samples) and 2 partial runs (repeated failed samples from my full runs, and samples from other projects). By eye, based on the quality scores and after QC, all the runs performed comparably. To be 100% sure I do have mock communities on all 4 runs, I could run those to confirm more objectively.
Thank you!