Advice in best practice for handling multiple sequencing runs and subset analyses in QIIME 2 (DADA2)

I would appreciate some advice on the best workflow for my analysis.

I have two sets of samples generated from two different sequencing runs. Both runs are 16S rRNA gene amplicons (V3–V4 region, 2×300 bp). One run contains 96 samples, while the second run contains only 2 samples.

I need to perform two separate analyses:

  1. An analysis including only 64 samples from the first run.

  2. An analysis including all samples from both runs.

My current plan is as follows:

  • Run DADA2 denoising separately for each sequencing run, using identical parameters.

  • Merge the resulting feature tables (and representative sequences).

  • Continue the downstream analysis on the merged data to obtain analysis #2 (all samples).

  • For analysis #1, filter the merged feature table to retain only the 64 selected samples and then perform downstream analyses based on this filtered table.

Does this workflow make sense from a QIIME 2 / DADA2 best-practice perspective?
Are there any potential issues or better approaches I should consider when dealing with multiple sequencing runs and subset analyses?

Thank you in advance for your help.

PM

Sure, this pipeline sounds reasonable to me, though we'll see what reviewer 3 thinks!

Here's why this whole thing works:

Two sequencing runs will have different error profiles. Running DADA2 on each run address that, while the identical parameters means ASVs should merge well in the next step. This also needs the PCR primers to match, and you have addressed that too!

Let us know how this goes and if you have more questions!


Yeah, the run with fewer samples will probably have more reads in each sample. More data is good, but the uneven sampling effort causes other problems...

This is the same normalization challenge for any amplicon dataset, but reviewer three is going to notice this and ask so be ready! :thinking: :shield:

1 Like

Thank you very much, I’m glad that this all make sense.

Luckily, those two samples were sequenced on a full plate. They are the only ones left from my project, so the number of reads shouldn’t be higher.

1 Like