Low percentage of non-quimeric reads after dada2

amonm82 · May 13, 2025, 10:31am

Hi everyone,

I sent my samples for 16S rRNA amplicon sequencing (targeting the V4–V5 region) in
two separate batches. The sequencing company provided the raw demultiplexed data, with primers and barcodes already removed.

Now I’m re-analyzing all the samples together using QIIME 2. The demux summary looks good https://drive.google.com/file/d/1kDO_ATXRNWMARK92BN5AfBet3G1vrbic/view?usp=sharing

, so I proceeded with DADA2 for denoising. For truncation parameters, I chose:

--p-trunc-len-f 240
--p-trunc-len-r 220

However, after running DADA2, I noticed that the percentage of non-chimeric reads is quite low, around 20%–40% for many samples.
metadata.tsv (25.8 KB)

Does anyone know why this might happen and how I can fix it?

Thanks in advance.

colinbrislawn · May 13, 2025, 12:07pm

Hello Andrea!

Do you know how many base pairs your V4-V5 PCR product should be? That should also tell us the expected amplicon length, which may gives us clues to how DADA2 might process this data.

How many PCR cycles were used? I ask because more PCR cycles lead to more chimeras, so it's possible that DADA2 is correctly detecting and removing a lot of chimeras from your data, which is good!

amonm82 · May 13, 2025, 1:13pm

Thank you Colin! The amplicon is 450 bp, but the company did not provide information about the number of PCR cycles used. Their sequencing data processing approach also differed from mine. According to their methods, they first merged the paired-end reads and performed pre-processing to generate what they refer to as "Clean Tags." Chimeric sequences were identified and removed at this stage, resulting in "Effective Tags," which were then used as input for DADA2. In their case, chimeras were largely removed prior to running DADA2, and the subsequent chimera removal step within DADA2 did not lead to a substantial loss of reads.

In contrast, I observe a much higher proportion of chimeras being removed during the DADA2 step, which results in fewer reads retained in my final dataset. I’m wondering whether I might be doing something incorrectly, or if there are any specific steps I should consider before running DADA2.

colinbrislawn · May 14, 2025, 4:02pm

Yeah, so this 'merge first' approach changes with the quality scores mean, so it's not recommended.

I'm not saying their pipeline is wrong. I am noting that the DADA2 dev says:

greater accuracy can be achieved by denoising before merging

We do not recommend pre-merging sequences, as it can interfere with the DADA2 error model.

Why do people KEEP making this mistake?

I suspect this is simply because popular OTU picking methods were first designed for single-end reads. When Illumina introduced paired-end reads, merging these reads then continueing to OTU clustering was practical.

DADA2 is the only method I know of that denoises first, then merges overlaps, so it's a bit of an odd duck.

This is the recommended way, so I would assume your results are better!

Did you sequence any positive controls that could be used to benchmark these methods?

system · June 14, 2025, 10:02pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.