I sent my samples for 16S rRNA amplicon sequencing (targeting the V4–V5 region) in
two separate batches. The sequencing company provided the raw demultiplexed data, with primers and barcodes already removed.
, so I proceeded with DADA2 for denoising. For truncation parameters, I chose:
--p-trunc-len-f 240
--p-trunc-len-r 220
However, after running DADA2, I noticed that the percentage of non-chimeric reads is quite low, around 20%–40% for many samples. metadata.tsv (25.8 KB)
Does anyone know why this might happen and how I can fix it?
Do you know how many base pairs your V4-V5 PCR product should be? That should also tell us the expected amplicon length, which may gives us clues to how DADA2 might process this data.
How many PCR cycles were used? I ask because more PCR cycles lead to more chimeras, so it's possible that DADA2 is correctly detecting and removing a lot of chimeras from your data, which is good!
Thank you Colin! The amplicon is 450 bp, but the company did not provide information about the number of PCR cycles used. Their sequencing data processing approach also differed from mine. According to their methods, they first merged the paired-end reads and performed pre-processing to generate what they refer to as "Clean Tags." Chimeric sequences were identified and removed at this stage, resulting in "Effective Tags," which were then used as input for DADA2. In their case, chimeras were largely removed prior to running DADA2, and the subsequent chimera removal step within DADA2 did not lead to a substantial loss of reads.
In contrast, I observe a much higher proportion of chimeras being removed during the DADA2 step, which results in fewer reads retained in my final dataset. I’m wondering whether I might be doing something incorrectly, or if there are any specific steps I should consider before running DADA2.
I suspect this is simply because popular OTU picking methods were first designed for single-end reads. When Illumina introduced paired-end reads, merging these reads then continueing to OTU clustering was practical.
DADA2 is the only method I know of that denoises first, then merges overlaps, so it's a bit of an odd duck.
This is the recommended way, so I would assume your results are better!
Did you sequence any positive controls that could be used to benchmark these methods?