I lost lots of data after running dada2 (denoise)

Hi everyone,

I used qiime2-2021.8 (installed with conda).
I have the paired-end demultiplexed fastq files for 16S metabarcoding data.

I did the denoise with the commond:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end-18.qza --p-trunc-len-f 301 --p-trunc-len-r 301 --p-trim-left-f 0 --p-trim-left-r 0 --o-table table-dada2-18.qza --o-representative-sequences rep-seqs-dada2-18.qza --o-denoising-stats denoise-dada2-18.qza

Before I did the denoise, I had average 110^5 - 210^5 reads per sample for both forward and reverse sequence. After denoise, I only got average 210^4 - 510^4 feature count per samples.

If I understand correctly, I lost around 70% reads data. Is this a normal situation or not? How can I know the problem of these reads?

Thanks in advance.

Welcome to the forum!
I guess the problem is in

parameters. Setting it to lower values or disabling may increase your output. See this post for relevant discussions.

1 Like

Thank you Timur!

I think there could be more problems during dada2. So I paste my denoise table here for your consideration. Could you please give me some hints to update my commond?
(I think there might something wrong during merge because I only got half outputs of the filters sequences?


Could you also provide the quality plots prior denoising and expected amplicon size or rRNA region targeted?
Low percentage of merged reads can be caused by bad quality scores at the ends of the reads (in that case you need to truncate them) or by too small overlapping region (here you will need to decrease minimum overlap in dada2 or disable truncation).

Hi Timur, Thanks for your reply!

Here I attached the quality plots

My target fragment is V4 region of 16S rRNA and I used 515f and 806r to amplify this region (~290 nts).

I did some research on this topic and I know how to decide the truncation length. So I updated my command and did the denoise again.

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end-18.qza --p-trunc-len-f 260 --p-trunc-len-r 180 --p-trim-left-f 0 --p-trim-left-r 0 --p-n-threads 4 --o-table table-dada2-18new.qza --o-representative-sequences rep-seqs-dada2-18new.qza --o-denoising-stats denoise-dada2-18new.qza

However, the results showed that there might be some problem during filtering chimeras. Here is the results:

I am quite confused with this process. Do you have some ideas about this?

In case you need the sequence information, I also have this here

Thank you so much!

Just one quick question - did you already remove primers with Cutadapt? Primers in sequences prior dada2 could cause loses during chimera removal. If not, could you remove primers first with --p-discard-untrimmed enabled and then run dada2 (truncation parameters should be adjusted again)?

1 Like

Oh! Yes I haven't remove primers as well as the barcode.
I will back after I tried Cutadapt.

Thank you so much!


Hi Timur!

Thanks for your help!

After the Cutadapt, I got the results as follow:

Then, I did the denoise again with the command:
qiime dada2 denoise-paired --i-demultiplexed-seqs trimmed-paired-end-18.qza --p-trunc-len-f 250 --p-trunc-len-r 200 --p-trim-left-f 0 --p-trim-left-r 0 --p-n-threads 4 --o-table table-dada2-new18-2.qza --o-representative-sequences rep-seqs-dada2-new18-2.qza --o-denoising-stats denoise-dada2-new18-2.qza

The result looks better now:

Do you think it is good enough to do the following analysis? Or should I change the truncation parameters to obtain more reads for analysis?

Thanks a lot!

Congratulations! Looks good to me. I think you can play with truncation parameters to check if you can get better outputs from this dataset and choose a best one or proceed as it is.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.