Understanding DADA2 steps

Morganator2000 · July 26, 2024, 8:21pm

I'm a new QIIME2 user and a first-time poster. I need some help understanding the DADA2 denoise function and what it's doing at each step.

I'm trying to extract ITS2 sequences from paired-end sequences for classification, but I'm running into issues at the denoise step. Some of my sequences are lost at the initial filter (up to 99%), some are lost at the merge step (up to 99% again), and other samples have ~60% of reads pass all the steps. The results are all over the place. So I think I really need to understand what DADA2 is doing at each of these steps. What are some common causes for a lot of sequences to be initially filtered out? What causes a huge loss of reads at the merge step? How can I change my parameters to get more reads to pass? And how do I know when not to adjust the parameters because they would let low-quality reads into my representative sequences?

I've run these commands so far. Just trimming the primers and then checking the read quality before denoising. Because I'm working with ITS data I'm not truncating at a specific position, as per this explanation.

qiime cutadapt trim-paired
--i-demultiplexed-sequences ITS2_PE.qza
--p-cores 16
--p-front-f GCATCGATGAAGAACGCAGC
--p-front-r TCCTCCGCTTATTGATATGC
--o-trimmed-sequences ITS2_PE.primer.trimmed.qza
--verbose
&> primer_trimming.logr

qiime cutadapt trim-paired
--i-demultiplexed-sequences ITS2_PE.primer.trimmed.qza
--p-cores 16
--p-front-f GCTGCGTTCTTCATCGATGC
--p-front-r GCATATCAATAAGCGGAGGA
--o-trimmed-sequences ITS2_PE.primer.trimmed2.qza
--verbose
&> primer_trimming.log

qiime demux summarize
--i-data ITS2_PE.primer.trimmed2.qza
--o-visualization ITS2_PE.primer.trimmed2.qzv

ITS2_PE.primer.trimmed2.qzv (325.7 KB)

qiime dada2 denoise-paired
--i-demultiplexed-seqs ITS2_PE.primer.trimmed2.qza
--p-n-threads 16
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-max-ee-f 2
--p-max-ee-r 2
--p-trunc-q 2
--output-dir DADA2_denoising_output_ITS2
--verbose
&> DADA2_denoising.log

Example of the results:

Input	Filtered	percentage of input passed filter	denoised	merged	percentage of input merged	non-chimeric	percentage of input non-chimeric
342306	1237	0.36	1230	1161	0.34	1161	0.34
399812	75239	18.82	75227	1439	0.36	1439	0.36
303956	165610	54.48	165599	1928	0.63	1928	0.63
43794	2824	6.45	2808	309	0.71	309	0.71
206416	134116	64.97	134086	133993	64.91	133993	64.91

Mike_Stevenson · July 26, 2024, 9:21pm

Hi @Morganator2000

I would try --p-trunc-len-f 260 & --p-trunc-len-r 190 to remove any low-quality reads. This should hopefully improve the number which pass the filter stage and allow a higher number to be merged successfully.

Morganator2000 · July 29, 2024, 3:01pm

I tried that and it helped with some samples, but I still have 13 out of 35 samples that lose between 50%-99.9% of reads at the initial filtering step and 8 samples that lose 99% of reads at the merge step. Oddly enough I'm not seeing any loss at the chimera step. I'm also apprehensive about using --p-trunc-len with ITS data as the DADA2 ITS Pipeline Workflow does not recommend truncating the reads at a specified position. ITS sequences have huge variances in length, so trimming at a specified position will cause the longer reads to be filtered out.

benjjneb · July 31, 2024, 6:54pm

The very high variability of read loss at both the filtering stage and at the merging stage is not something that I've seen before, so I'm going to have to speculate a bit.

My first thought is that this is an ITS length-variability issue. Perhaps samples dominated by short amplicons are reading through into the opposite primer, adapter and beyond, and resulting in very low quality reads that are being removed by filtering. Meanwhile, other samples with long amplicons are failling to merge because the reads don't overlap.

This leads into another question from my end, is there a current recommended Q2 workflow for ITS amplicon data? In the DADA2 R space with have our ITS workflow, which uses cutadapt to remove primer and truncate reads prior to the main DADA2 workflow. We also often recommend that folks with intractable merging problems (which can arise if the amplified part of the ITS often exceeds the total length of the forward+reverse reads) consider using forward reads alone to avoid the merging issues. There was a fungi-ITS-specific paper that indpendently described this same R1-only approach: Redirecting

All that said, is there any pattern you can see in the types of samples that are either being lost mostly at filtering, or lost mostly at merging?

Morganator2000 · August 6, 2024, 5:55pm

So I tried @Mike_Stevenson's suggestion and it somewhat helped for the short reads, but the longer reads were still not passing the initial filter, so that seemed like a dead end. Instead in a fit of trying different things, I increased the --p-trunc-q to 20.

qiime dada2 denoise-paired
--i-demultiplexed-seqs ITS2_PE.primer.trimmed2.qza
--p-n-threads 16
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-max-ee-f 2
--p-max-ee-r 2
--p-trunc-q 20
--output-dir DADA2_denoising_output_ITS2
--verbose
&> DADA2_denoising.log

Somehow that helped, and now most samples are having 60% of reads pass the initial filter, with a few of the samples having just 20% pass. It's an improvement, but I don't quite undestand why that helped. And for 18 out of 35 of the samples, I'm still losing a good chunk to the merging step.

To answer you @benjjneb, I can't see any sort of pattern in the samples. I know that in the past when I would trim/truncate ITS data I would lose all of the long reads, due to the region's high variability. That's not what's happening here though. I do like your suggestion of using forward reads only. I'll read the article first before I commit to it.

I'm in the process of running the data through DADA2 in R. I'm hoping that the additional functionality of DADA2 R will allow me to isolate the problem. I'll keep everyone updated on how that goes.

system · September 6, 2024, 11:56pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.