Hello,
I am working with sequences generated on an Illumina MiSeq with v3 chemistry (2x300 bp) with primers targeting the V3/V4 region. The primers used were 341F (CCTACGGGNGGCWGCAG) and 805R (GACTACHVGGGTATCTAATCC). But when I use QIIME 2 (version 2022.11.1) and DADA2 for denoising, I'm only getting about 30-40% of my sequences being merged. So, I'm wondering if this would be acceptable? Or are there ways for this to be improved?
As far as I can tell there should be plenty of overlap for merging, as DADA2 should require a minimum of ~12 bp and the amplicon size based on these primers should be 464 bp.
Removing the primer sequences with cutadapt:
qiime cutadapt trim-paired --i-demultiplexed-sequences demux-paired-end-V3V4-2.qza --p-front-f CCTACGGGNGGCWGCAG --p-front-r GACTACHVGGGTATCTAATCC --o-trimmed-sequences trim-demux-paired-end-V3V4-2.qza
Summarizing the information about the sequences after primer removal:
qiime demux summarize --i-data trim-demux-paired-end-V3V4-2.qza --o-visualization post-cutadapt-demux-V3V4-2.qzv
The output from this summary looks like this:
So, I tried some different levels of truncation during DADA2. This was the command used:
qiime dada2 denoise-paired --i-demultiplexed-seqs trim-demux-paired-end-V3V4-2.qza --p-trunc-len-f 273 --p-trunc-len-r 210 --o-representative-sequences rep-seqs-dada2-273-210-v3v4-2.qza --o-table table-dada2-273-210-v3v4-2.qza --o-denoising-stats stats-dada2-273-210-v3v4-2.qza --p-min-fold-parent-over-abundance 2 --verbose --p-n-threads 8
And here is a summary of what that looks like at varying truncation levels:
trunc fwd/rev | % passed filter | % merged | % non-chimeric | median of % non-chimeric |
---|---|---|---|---|
273/210 | 64.67 - 75.83 | 34.16 - 46.02 | 33.10 - 41.83 | 38.11 |
273/216 | 63.47 - 74.74 | 33.86 - 45.65 | 32.82 - 41.54 | 37.59 |
274/231 | 58.07 - 71.47 | 31.93 - 43.61 | 31.15 - 40.02 | 35.54 |
280/200 | 62.95 - 75.74 | 33.49 - 44.27 | 32.54 - 40.97 | 37.4 |
280/250 | 44.51 - 62.87 | 24.85 - 37.22 | 24.46 - 35.06 | 28.88 |
It just seems that I lose half of the sequences during merging that did pass filter. I believe there is more than enough overlap for merging, so that shouldn't be the issue. And I've removed primers are there are no adapters on my sequences. Any thoughts about what is happening here or how to improve the outcome?
And here are the files I've generated, just in case they are helpful. I've included the output from DADA2 with the trunc values of 273-210:
post-cutadapt-demux-V3V4-2.qzv (322.1 KB)
rep-seqs-dada2-273-210-v3v4-2.qzv (2.0 MB)
stats-dada2-273-210-v3v4-2.qzv (1.2 MB)