Effect of primer removal on chimera detection in DADA2

Shreya · August 19, 2025, 10:46am

Hello QIIME2 community,

I am working on gut microbiome analysis using QIIME2 and facing challenges with high levels of chimeras in my dataset. After going through the literature, I found several reports suggesting that removing primers prior to denoising can reduce the proportion of chimeric reads.

I wanted to test this, so I removed primers using cutadapt and then ran qiime dada2 denoise-paired. Interestingly, the same samples that previously yielded only 10–15% non-chimeric reads (after denoising with primers still present) increased to about 40–45% non-chimeric reads once primers were removed.

This makes me wonder:

Why does primer removal have such a strong effect on chimera detection in DADA2?
Since DADA2 also allows trimming and filtering during the denoising step, is it recommended to remove primers externally with cutadapt first, or rely on trimming parameters in the denoising step itself?

I would really appreciate clarification on which approach is more reliable in terms of maximizing high-quality, non-chimeric reads.

Thank you!

ebolyen · August 19, 2025, 4:50pm

Hi @Shreya,

Because primers essentially are chimeras they are an artificial sequences that via PCR becomes the template for all of your reads. So DADA2 sees something that would be better explained as ASV1(primer) + ASV2(amplicon).
Depends on your protocol, presuming your primers aren’t variable length, then it should be fine to use trim_left etc as you can predict when your amplicon starts, and that is the general purpose of those params. In the event your data is messier than that, then you might reach for cutadapt which can more flexibly search for your primers.

Shreya · August 20, 2025, 6:13am

Thanks for the reply!

After removing the primer sequences with Cutadapt, I set the truncation parameters to --p-trunc-len-f 245 and --p-trunc-len-r 235, based on the expected amplicon size of the 16S V3–V4 region (~460 bp after primer removal). However, this resulted in only ~0.03–0.05% of reads being retained as non-chimeric. Interestingly, when I used the default DADA2 denoising parameters instead, the percentage of non-chimeric reads increased. This has left me confused about which approach is more appropriate to follow.

ebolyen · August 20, 2025, 5:29pm

~~You’ll need trim-left-f and trim-left-r so that it’s cutting from the 5’ end where the primer sits. . trunc-len is for the overlap region within your amplicon.~~

Completely misread, sorry.

It sounds like maybe cutadapt command has not done what you need. Would you be able to share that?

Shreya · August 21, 2025, 4:51am

This was my cutadapt command: qiime cutadapt trim-paired --i-demultiplexed-sequences Demuxtrial_aug2025.qza --p-front-f CCTACGGGNGGCWGCAG --p-front-r GACTACHVGGGTATCTAATCC --p-match-read-wildcards --o-trimmed-sequences demux_trimmed_trial_aug2025.qza --verbose

colinbrislawn · August 21, 2025, 9:18pm

Would you be willing to post the DADA2 denoising stats file? This would let us check where in the DADA2 pipeline reads are being kept (or removed!), which I find helpful.

Shreya · August 22, 2025, 12:57pm

denoisestats_aftertrimming.tsv (801 Bytes) This is the file which was obtained after cutadapt and filtering at the denoising step

denoisestats_notrim.tsv (999 Bytes) this is the file which was obtained after cutadapt but no filtering was done at denoising step

colinbrislawn · August 22, 2025, 3:55pm

Thanks!

I've never had a data set like this. Is cutadapt making the reads too short to pass the DADA2 filter?

Let's see what Evan suggests!

gregcaporaso · August 25, 2025, 6:14pm

Hi @Shreya, Do you know if primer removal is needed in your data? In other words, does the sequencing strategy you’re applying result in primers being part of the sequence read output? Some protocols, like the EMP protocol, don’t result in sequencing of the primers.

If possible, I’d like to have you generate and share two .qzv files for us to look at. These would be the results of running qiime demux summarize … on the input that you’ve provided to qiime cutadapt trim-paired and on the output that it generated for you. Depending on how these look, I may have a couple of additional requests for you.

Thanks!