Dada2: merging plates and chimera removal

Hi, I am running an analysis that combines data from multiple MiSeq runs from different studies. My approach has been to generate ASVs and remove chimeras using the dada2 denoise-paired plugin individually for each plate, and then merge all the resulting ASVs from all plates into one big dateset with feature-table merge-seqs and feature-table merge. However, I just noted that on the big data tutorial on the Dada2 site (A DADA2 workflow for Big Data (1.4 or later)), the chimera removal is done after all the plates are merged. My question is, does it matter if the chimera removal step is done prior or after merging all the ASVs together? As far as I can tell, both the consensus and pooled approaches for --p-chimera-method take some consideration of the larger pool of sequences in the whole plate into account; this leads me to think that perhaps doing chimera removal before or after merging all ASVs from all the plates will produce different results. However, on the other hand, since the different plates usually have samples from different studies/locations on them (although often similar habitats), it doesn't seem like it is going to provide a radically different result if the chimeras are removed prior to merging all ASVs into one dateset. I am curious what others think about this, and whether or not it matters. I have noticed some workflows posted online (e.g., Merging DADA2 Results in QIIME2 - John Quensen) that don't mention chimera removal after merging multiple ASV datasets together, which leads me to think it isn't that important to wait until after merging.

If I did want to run the chimera removal for dada2 after merging all the ASVs together, is there a way to do that with dada2 denoise-paired? I see that there is an option to turn off the chimera removal, which I could do when running each individual plate. However, after merging all the ASVs into one big dataset, is there a way to run the chimera removal function in dada2 denoise-paired without it also doing the actual denoising step? Or, would I need to use one of the other chimera removal tools available in Qiime2?

Hi, Peat! Welcome to the forums :wave: :qiime2:

I think both ways (chimera removal per-run or across all runs after merging) are common and accepted methods. This is because, in practice, the results are similar, just like you said:

I'm not sure there's a way to do this using the Q2-dada plugin... You could do this directly with DADA2 in R, or using the Qiime2 vsearch plugin, see vsearch uchime-denovo and uchime-ref.


While we are on this topic, it's worth mentioning why it's preferable in theory to detect and remove chimers per-run, and why it's comparable in practice to do it later on.

How are chimeras formed?

'chimeras' are thought to be a technical artefact of the PCR reaction. From PMC3044863, Figure 1, summarized on Wikipedia

It occurs when the extension of an amplicon is aborted, and the aborted product functions as a primer in the next PCR cycle. The aborted product anneals to the wrong template and continues to extend, thereby synthesizing a single sequence sourced from two different templates.

How can you detect and remove these artificial hybrids?

Given that:

  1. each chimera is composed from real amplicons in a sample, and
  2. more common amplicons should cause more chimeras, and
  3. these fake 'children' chimeras should be less abundant than their real 'parent' amplicons

Then:

  • You could look for less common amplicons that could be explained as a combination of more common amplicons, and label them a chimeric!

When is the best time to find (and remove) chimeric reads?

  1. After dereplicating each sample separately: PCR is performed separately on each sample, and causes chimera formation separately on each sample, so you could find and remove chimeras from each sample! (I don't think any pipelines do this, because...
  2. After denoising each sample separately: same logic as above, but now we have removed noisy reads for a smaller data set and faster chimera finding! :fast_forward:
  3. After denoising all samples on a single run: because the same features are often the most abundant across samples, and we are just looking for less abundant 'children' from the most abundant 'parents', we might as well do this just once for each run. 🤷
  4. After denoising and merging all feature tables: same logic as above, but now we only have to do this parent-child search 1 time in the whole pipeline. :sunglasses: #YOCCO

I hope that helps, but if that raises more questions than answers, let's keep this discussion going!

Colin

P.S. After listing those options, I'm starting to think that we should be doing the chimera search earlier in our pipelines. :thinking: Has someone tried this using modern ASV / denoising methods and shown that it's identical, because I can't find a citation that users ASVs... :scream_cat:

3 Likes

Hi Colin, thanks so much for your detailed response. It was very helpful. I feel good moving ahead with my analyses now.

Best
Peat