Issues with Denoising Using DADA2 and Deblur - Parameter Tuning and Merging Sequences

Hi everyone,

I am sorry for the long post, but I wanted to provide as much detail as possible.

I am new to using QIIME 2 - 2024.2 (miniconda3) and I have sequences from a 2 x 300 bp Illumina run (338F_806R). Attached are the q-score plots of my raw data 16Ssequences-391521.qzv (315.4 KB)

After importing raw data into qiime2, I first attempted to denoise the sequences using DADA2 with the following command:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs 16Ssequences.qza \
  --p-trim-left-f 26 \
  --p-trim-left-r 26 \
  --p-trunc-len-f 297 \
  --p-trunc-len-r 207 \
  --p-max-ee-f 3
  --p-max-ee-r 3
  --p-n-threads 6 \
  --o-table 16Sseqs_table.qza \
  --o-representative-sequences 16Sseqs_rep.qza \
  --o-denoising-stats 16Sseqs_denoising_stats.qza

However, I encountered unsatisfactory results. Below is a summary of my attempts with different parameter sets, including using only forward reads (last row), which showed better results.

A B C D E F G H I G
trim-left-f trunc-len-f trim-left-r trunc-len-r overlap max-ee-f max-ee-r percentage of input passed filter percentage of input merged percentage of input non-chimeric
26 296 26 246 22 4 4 73-75% 38-41% 35-39%
26 296 26 236 12 4 4 80-81% 42-46% 39-43%
26 296 26 230 6 4 4 80-82% 43-48% 40-45%
26 296 26 220 -4 5 5 81-83% 46-51% 42-47%
26 196 / / / 7 / 87-89% / 70-75%

Here is one of the QZV files: 16S_296236_denoising_stats.qzv (1.2 MB)

Since DADA2 did not provide satisfactory results, I then attempted to denoise the raw data using Deblur.
Before running Deblur, I used the following steps:

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences 16Ssequences_391521.qza \
  --p-cores 4 \
  --p-front-f ACTCCTACGGGAGGCAGCAG \  
  --p-front-r GGACTACACGGGTATCTAAT \
  --p-error-rate 0.2 \
  --p-discard-untrimmed \
  --o-trimmed-sequences 16S_trimmed.qza

qiime vsearch merge-pairs \
  --i-demultiplexed-seqs 16S-trimmed.qza \
  --o-merged-sequences 16S-deblur/merged-seqs.qza \
  --o-unmerged-sequences 16S-deblur/unmerged-seqs.qza
qiime demux summarize \
  --i-data 16S-deblur/merged-seqs.qza \
  --o-visualization 16S-deblur/merged-seqs-summary.qzv

qiime quality-filter q-score \
  --i-demux 16S-deblur/merged-seqs.qza \
  --o-filtered-sequences 16S-deblur/16Sseqs-filtered.qza \
  --o-filter-stats 16S-deblur/16Sseqs-filter-stats.qza

qiime deblur denoise-16S \
  --i-demultiplexed-seqs 16S-deblur/16Sseqs-filtered.qza \
  --p-trim-length 0 \
  --o-representative-sequences 16S-deblur/rep-seqs-deblur.qza \
  --o-table 16S-deblur/table-deblur.qza \
  --p-sample-stats \
  --o-stats 16S-deblur/deblur-stats.qza

I observed that around 50000 reads were merged, while I have more than 250,000 raw reads. Here is the summary QZV file: merged-seqs-summary.qzv (305.2 KB). I also tried adjusting parameters like --p-minovlen and --p-maxee, but the results still did not improve as expected. Additionally, here is the QZV file of the feature table using deblur. table-deblur.qzv (422.9 KB)

I would appreciate your assistance with the following questions:

  1. The percentage of merged and non-chimeric sequences is consistently below 50%. Could you advise on how to improve parameter settings for DADA2 and Deblur to obtain better results?
  2. How should I prioritize between the overlap of forward and reverse reads and maximizing percentage of merged and non-chimeric sequences? ( e.g. the 4th row in the table)
  3. Should I consider using only forward reads for this analysis? My concern is whether important microbial information would be lost if I proceed with that approach.

Thanks a lot for your help,
Siqin

Hello @Siqin_Zhang,

Your primary issue as far as dada2 is concerned appears to be merging. Some rough numbers: your amplicon length is 806 - 338 = 468. Your forward read is 297 - 26 = 271. Your reverse read is 207 - 26 = 181. Your expected merged sequence is 271 + 181 - 12 (overlap) = 440. Your expected merged sequence length (440) is thus less than your amplicon length (468). This is likely the issue with merging. Your sequences do not need a forward trim parameter for quality reasons. If you're providing a forward trim parameter to remove synthetic subsequences then it's recommended that you use cutadapt instead. To get better results with dada2 you should try making the merged sequence longer, whether by extending the truncation position or lessening the trim position.

1 Like