DADA2, odd results from denoising stats (varying input passed filter, low number of reads, low % of input merged and ASVs)

joaomiranda · January 24, 2024, 11:17am

Hello everyone!

I'm coming here to solve some issues involving my data after I executed the denoising by dada2.

Key points:

I'm analyzing amplicon sequencing using Qiime 2023.9
I have 192 samples, of which around 6 samples are negative and positive controls (mock)
My data was generated by a paired-end 2x250 bp Illumina Miseq V3-V4 sequencing

First I've executed the cutadapt, to remove primers and adapters:

qiime cutadapt trim-paired \
 --i-demultiplexed-sequences paired-end-demux.qza \
 --p-front-f CCTACGGGNGGCWGCAG \
 --p-front-r GACTACHVGGGTATCTAATCC \
 --p-error-rate 0 \
 --o-trimmed-sequences trimmed-seq.qza \
 --verbose

From this output: trimmed-seq.qzv (323.2 KB), I've analyzed the interactive plot and decided to truncate only my reverse reads, due to a high quality score from my forward reads. I've tried truncating in 195bp considering a threshold Phred-score of 15:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs trimmed-seq.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 195 \
  --o-table dada2_tests/table-paired.qza \
  --o-representative-sequences dada2_tests/rep-seqs-paired.qza \
  --o-denoising-stats dada2_tests/denoising-stats.qza

Here are the outputs:
denoising-stats.qzv (1.2 MB)
rep-seqs-paired.qzv (319.4 KB)
table-paired.qzv (448.8 KB)

As you can see I have odd results, my filter is varying a lot (from 0.01 to 69%), there are a low percentage of merge, low number of reads and ASVs. I have never seen these variability...

I tried to use the --p-max-ee-f and the --p-max-ee-r, modifying the default, 2, for 3 and there are my outputs:
denoising-stats_2.qzv (1.2 MB)
rep-seqs-paired_2.qzv (341.4 KB)
table-paired_2.qzv (459.1 KB)

I see a slight improvment, but the variability and low percentage merge are still present.

What is the best way to solve this? Should I use just the forward reads? May I've been cutting at the wrong areas using trim and truncate ?

Thnaks in advance

cherman2 · January 25, 2024, 3:48pm

Hi @joaomiranda,

Welcome Back to the :qiime2: forum!
Thank you for including so many details in your post.

So there are two thing:

it looks like the quality of your sequences may require you to trim more. I would less around with the forward read trunc length:

--p-trunc-len-f 0

This is a balance of truncating so you have good quality but also having enough sequence length to merge your forward and reverse reads.

There is some quick math that I typically use as a guide to see if my sequences will merge. This is definitely not full proof because there are variations of sequence length in the 16s region

merged sequences length = ([Reverse Primer BP Location]-[Foward Primer BP Location]+Necessary overlap)

So for the v3-v4 region those primer locations are typically 341f and 785r. Please check that these are your primer locations to confirm that they are the same. Dada2 requires 12 nt of overlap. So your equation so look something like 785-341 = 444 + 12 = 456. So your Forward and Reserve read when added together should be around 456 nt for a successful merge.

So my advice would be to play around with your truncation values and see if your get better filtering and merge percentages. If this doesn't work, let us know we would be happy to continue debugging this with you!

Hope that helps!

joaomiranda · January 27, 2024, 1:17am

Hi @cherman2 ,

Thank you for your reply!

I tried again according with your insights, I changed my commands to p-trunc-len-f 240 and p-trunc-len-r 220 with the aim to increase the lenght of my sequences to 460 nt.

My results were worst compared to when I runned with p-trunc-len-f 0 and p-trunc-len-r 195.
denoising-stats_3.qzv (1.2 MB) rep-seqs-paired_3.qzv (325.2 KB) table-paired_3.qzv (455.6 KB), here is my code:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs trimmed-seq.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 240 \
  --p-trunc-len-r 220 \
  --p-max-ee-f 3 \
  --p-max-ee-r 3 \
  --o-table dada2_tests/table-paired_3.qza \
  --o-representative-sequences dada2_tests/rep-seqs-paired_3.qza \
  --o-denoising-stats dada2_tests/denoising-stats_3.qza

I'm thinking that a lenght higher than 195nt for the R read it's including some parts that have a Phred-Score < 15... This could be having an impact on filtering and merging?

I don't think that my sequences are too bad for these low percentages, therefore before considering the use of only the forward reads, are there more tests to try?

cherman2 · January 29, 2024, 3:23pm

Hi @joaomiranda,
You are right these do not look better.

I think this is the one of the reasons, the Reverse read has some pretty low quality scores that are getting filtered out. We may struggle to get enough sequence length on this reverse read to merge sucessfully.

I would continue playing around with the trim and truncation scores. Like have you tried -f 240, -r 195? That would be alittle shorter then my math for sequence merging above but amplicon regions have variation in length so you may have decent luck merging if you can get it passed filtering.

I would contrinue to try to mess around with the trunc values but single end is always a decent back up.
Hope this helps!

system · February 29, 2024, 9:24pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.