Sequencing platform fully confounded with study period: MiSeq vs NextSeq 2000 (16S rRNA V3-V4)

Hi everyone,

We're working on a nasopharyngeal 16S rRNA study (V3-V4) comparing a pre-pandemic cohort with a post-pandemic cohort in healthy individuals and individuals with invasive pneumococcal disease. Our problem is a methodological/design issue we'd like a second opinion on.

All pre-pandemic samples were sequenced on MiSeq, and all post-pandemic samples on NextSeq 2000. Sequencing platform is therefore completely confounded with study period. We're aware this means no batch-correction method can statistically separate the platform effect from the pandemic effect, since they're perfectly collinear.

Here are some study details:

  • 16S V3-V4, 2x300 bp on both platforms (so amplicon coverage should be comparable).
  • 92 pre-pandemic samples across 4 MiSeq runs and 107 post-pandemic samples in 1 NextSeq 2000 run.
  • Groups (healthy/disease) are distributed across the 4 pre-pandemic runs.

What we're already planning is to run DADA2 with per-run error learning (the 4 MiSeq runs + the NextSeq run separately), using the same truncation length across runs so the ASVs can be merged. We're also considering re-sequencing a subset of pre-pandemic samples on the NextSeq 2000 as bridge samples, to directly measure the platform effect.

Our questions are:

  • Given the full confounding, is a pre/post comparison interpretable at all? How have others framed this limitation?
  • For the bridge samples, how many pairs would you consider a defensible minimum to characterize the platform effect?
  • Are there additional harmonization steps you'd recommend when merging MiSeq and NextSeq 2000 amplicon data?
  • What's the current recommended way to handle NextSeq binned quality scores within a QIIME2 workflow, especially when mixing binned (NextSeq) and non-binned (MiSeq) runs in the same study?
  • Any published examples of MiSeq vs NextSeq 2000 comparisons for 16S data (or any Illumina cross-platform 16S comparison) that we could reference?

We haven't found much on this, so this could be a really interesting one to discuss. Looking forward to your thoughts. Thanks so much in advance!