DADA2 denoise is removing more than 50% of the data


I am going through the usual QIIME2 pipeline and I noticed that DADA2 denoise is removing almost 50% of the data (as seen in the denoising_stats.qzv file).

denoising_stats.qzv (1.2 MB)

I tried running the same data using a different tool and it does not end up removing the same amount of data as DADA2. I used cutadapt to remove primers prior to running DADA2, using the command:-
qiime cutadapt trim-single --i-demultiplexed-sequences demux_seqs_original.qza --p-cores 8 --p-minimum-length 350 --o-trimmed-sequences demux_seqs.qza --p-adapter CCTACGGGAGGCAGCAG...ATTAGAWACCCBDGTAGTCC --p-discard-untrimmed

demux_seqs.qzv (296.7 KB)

The DADA2 command I use is:-
qiime dada2 denoise-single
--p-n-threads 200
--i-demultiplexed-seqs ./demux_seqs.qza
--p-trunc-len 0
--output-dir DADA2_denoising_output
&> DADA2_denoising.log

I have a few questions:-

  1. Is it possible to know how specifically denoising is removing so many sequence reads from the samples? I dont see quality as a big issue as I am already removing low-quality data and primer sequences from the dataset.
  2. Is it possible that all of the removed data is basically chimeras and they are just reported like this? I say this because within the denoising report, I don't see many chimeras being detected within the samples.

Good afternoon,

Yes! In newer version of Qiime2 and the q2-dada2 plugin, more detailed statistics are reported about this process. When you upgrade from 2021.11.0 to version 2022.11, as an example, the dada2 output will include percent reads removed due to joining and chimera filtering as separate columns, which should answer you question. :microscope:

That is possible! :thinking:

I'm guessing something is wrong with read joining, but you will have to update and rerun DADA2 to find out!

Hi Colin,

Thank you for responding. I took your advice and updated my qiime to version 2022.11. It ran successfully and generated a log file. But I am not seeing any extra information like you mentioned. It shows the same columns it showed with the previous version. Is there a different place I got to look for the detailed statistics? Thank you for your help.
denoising_stats_v2.qzv (1.2 MB)
DADA2_denoising_v2.txt (1.7 KB)

Ah, I made a mistake.

denoise_single will not have columns for joining, as joining/merging only applies to paired-end reads.

This means that the majority of your read losses are in the main dada() error correction / removal step (code here), just like you found when running the older version of DADA2.

I'm not sure what's causing this, or why some samples are more affected than others.

Anyone got any ideas?

Oh, I missed something really important!

This DADA2 pipeline is made for single end Illumina reads. But when I check the quality graph...

That looks like paired-end reads that had already been joined! :scream_cat:

:warning: Using joined, paired-end reads with dada2 denoise-single breaks this pipeline.

The good news is that there is a good solution to this problem: import your reads before pairing, then use qiime dada2 denoise-paired.