Processing multiple runs and repeated samples

Hi, I have (unfortunately) yet another question on common practices or SOPs for processing data/samples spread over multiple sequencing runs.

I've found plenty of information on the forum already, thanks to the moderators for answering all those questions! To name some threads: Multiple sequencing runs -- normalizing methods? - #3 by Nicholas_Bokulich

To summarize (based on my understanding of all the questions and answers) - it seems the general recommendation when going for denoising with DADA2 is still to keep runs separate and merge after denoising, since different runs might have different error profiles. If Deblur is used instead, runs can be processed together without issues. Though in certain cases, if the samples were run on exactly the same machine with the same library prep method, keeping the runs separate might not be necessary, and one might even benefit from setting the denoise pooling method to --p-pooling-method pseudo for combined runs.

@colinbrislawn I was wondering if this is still the case and what denoising method do you use based on your recommended strategy, because I was intent on following your suggestion in Best way to merge or group runs/samples - #2 by colinbrislawn

Let’s zoom out a bit. You have lots of samples, and you want to group them in different ways. The major choice here is to either:

  1. process them all in one batch, then make meaningful categories using feature-table group or
  2. process them in many batches (one for each meaningful category), then feature-table merge when you want to see the full study.

I’m a big fan of option one. It’s much easier to group and subset from your whole data set, than to merge and recombine all the fragments of your study. But qiime lets you do it both ways

Should one stick with Deblur for unified batches of multiple sequencing runs or can DADA2 also be used? I also included the sequencing run info in the metadata so that potential batch effects can still be identified after denoising and before merging biological samples with feature-table group. But since the same error profile was used for all of them together, would such differences still be detectable?

The reason why I'm inclined to process my runs together is that I have (negative) DNA extraction and PCR controls spread over several plates/runs. Could including more/all of them during the denoising (as opposed to maybe 2-3 per run) help with chimera detection as these samples will have more chimeric sequences (I suppose)?

In my case, there are 2 full runs (my samples) and 2 partial runs (repeated failed samples from my full runs, and samples from other projects). By eye, based on the quality scores and after QC, all the runs performed comparably. To be 100% sure I do have mock communities on all 4 runs, I could run those to confirm more objectively.

Thank you!

1 Like

Hello Leon,

Thank you for bringing that old thread to my attention.

I was wrong in 2018, so I've updated the post and included that update here.

2024 update:
Both options are okay, but one works better for DADA2!
Because DADA2 builds an error model for each sequencing run, you should run DADA2 on each sequencing run separately.

This is the most important part:

DADA2 should make stable results either way. I guess you could test this?

Fantastic!

Your inclusion of positive and negative controls all the way through sequencing places you well ahead of most other researchers and gives you an excellent way to check your work.


Thank you for bringing that historical mistake to my attention and giving me a chance to fix it!

Thank you Colin for the clarification. I guess I'll reprocess the runs separately then :smiling_face_with_tear:

1 Like