I am working on gut microbiome analysis using QIIME2 and facing challenges with high levels of chimeras in my dataset. After going through the literature, I found several reports suggesting that removing primers prior to denoising can reduce the proportion of chimeric reads.
I wanted to test this, so I removed primers using cutadapt and then ran qiime dada2 denoise-paired. Interestingly, the same samples that previously yielded only 10–15% non-chimeric reads (after denoising with primers still present) increased to about 40–45% non-chimeric reads once primers were removed.
This makes me wonder:
Why does primer removal have such a strong effect on chimera detection in DADA2?
Since DADA2 also allows trimming and filtering during the denoising step, is it recommended to remove primers externally with cutadapt first, or rely on trimming parameters in the denoising step itself?
I would really appreciate clarification on which approach is more reliable in terms of maximizing high-quality, non-chimeric reads.
Because primers essentially are chimeras they are an artificial sequences that via PCR becomes the template for all of your reads. So DADA2 sees something that would be better explained as ASV1(primer) + ASV2(amplicon).
Depends on your protocol, presuming your primers aren’t variable length, then it should be fine to use trim_left etc as you can predict when your amplicon starts, and that is the general purpose of those params. In the event your data is messier than that, then you might reach for cutadapt which can more flexibly search for your primers.
After removing the primer sequences with Cutadapt, I set the truncation parameters to --p-trunc-len-f 245 and --p-trunc-len-r 235, based on the expected amplicon size of the 16S V3–V4 region (~460 bp after primer removal). However, this resulted in only ~0.03–0.05% of reads being retained as non-chimeric. Interestingly, when I used the default DADA2 denoising parameters instead, the percentage of non-chimeric reads increased. This has left me confused about which approach is more appropriate to follow.
You’ll need trim-left-f and trim-left-r so that it’s cutting from the 5’ end where the primer sits. . trunc-len is for the overlap region within your amplicon.
Completely misread, sorry.
It sounds like maybe cutadapt command has not done what you need. Would you be able to share that?
Would you be willing to post the DADA2 denoising stats file? This would let us check where in the DADA2 pipeline reads are being kept (or removed!), which I find helpful.
Hi @Shreya, Do you know if primer removal is needed in your data? In other words, does the sequencing strategy you’re applying result in primers being part of the sequence read output? Some protocols, like the EMP protocol, don’t result in sequencing of the primers.
If possible, I’d like to have you generate and share two .qzv files for us to look at. These would be the results of running qiime demux summarize … on the input that you’ve provided to qiime cutadapt trim-paired and on the output that it generated for you. Depending on how these look, I may have a couple of additional requests for you.