DADA2 denoise is removing more than 50% of the data

ankurnaqib · February 21, 2023, 10:27pm

Hi,

I am going through the usual QIIME2 pipeline and I noticed that DADA2 denoise is removing almost 50% of the data (as seen in the denoising_stats.qzv file).

denoising_stats.qzv (1.2 MB)

I tried running the same data using a different tool and it does not end up removing the same amount of data as DADA2. I used cutadapt to remove primers prior to running DADA2, using the command:-
qiime cutadapt trim-single --i-demultiplexed-sequences demux_seqs_original.qza --p-cores 8 --p-minimum-length 350 --o-trimmed-sequences demux_seqs.qza --p-adapter CCTACGGGAGGCAGCAG...ATTAGAWACCCBDGTAGTCC --p-discard-untrimmed

demux_seqs.qzv (296.7 KB)

The DADA2 command I use is:-
qiime dada2 denoise-single
--p-n-threads 200
--i-demultiplexed-seqs ./demux_seqs.qza
--p-trunc-len 0
--output-dir DADA2_denoising_output
--verbose
&> DADA2_denoising.log

I have a few questions:-

Is it possible to know how specifically denoising is removing so many sequence reads from the samples? I dont see quality as a big issue as I am already removing low-quality data and primer sequences from the dataset.
Is it possible that all of the removed data is basically chimeras and they are just reported like this? I say this because within the denoising report, I don't see many chimeras being detected within the samples.

colinbrislawn · February 23, 2023, 9:58pm

Good afternoon,

Yes! In newer version of Qiime2 and the q2-dada2 plugin, more detailed statistics are reported about this process. When you upgrade from 2021.11.0 to version 2022.11, as an example, the dada2 output will include percent reads removed due to joining and chimera filtering as separate columns, which should answer you question.

That is possible!

I'm guessing something is wrong with read joining, but you will have to update and rerun DADA2 to find out!

ankurnaqib · February 24, 2023, 9:54pm

Hi Colin,

Thank you for responding. I took your advice and updated my qiime to version 2022.11. It ran successfully and generated a log file. But I am not seeing any extra information like you mentioned. It shows the same columns it showed with the previous version. Is there a different place I got to look for the detailed statistics? Thank you for your help.
denoising_stats_v2.qzv (1.2 MB)
DADA2_denoising_v2.txt (1.7 KB)

colinbrislawn · February 25, 2023, 12:53am

Ah, I made a mistake.

denoise_single will not have columns for joining, as joining/merging only applies to paired-end reads.

This means that the majority of your read losses are in the main dada() error correction / removal step (code here), just like you found when running the older version of DADA2.

I'm not sure what's causing this, or why some samples are more affected than others.

Anyone got any ideas?

colinbrislawn · March 10, 2023, 4:19pm

Oh, I missed something really important!

This DADA2 pipeline is made for single end Illumina reads. But when I check the quality graph...

That looks like paired-end reads that had already been joined!

Using joined, paired-end reads with dada2 denoise-single breaks this pipeline.

The good news is that there is a good solution to this problem: import your reads before pairing, then use qiime dada2 denoise-paired.

system · April 10, 2023, 11:35pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.