Losing too many reads with DADA2

I am using Qiime2 v2019.1 in a high performance computing environment.

I am testing my 16S v3/v4 data using DADA2 on 2x300 paired end reads.

I tried running DADA2 denoise-paired trimming at 150bp in one iteration and again with no trimming in another iteration. With both trimming choices I ended up losing the majority of my reads during dada2 denoise-paired.

Here is a portion from my stats file for the 150bp trimming run:

image sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
BMI-Plate10WellC10-16S 16502 252 252 0 0
BMI-Plate15WellE4-16S 13882 7841 7841 0 0
BMI-Plate20WellG10-16S 32631 25711 25711 91 90
BMI-Plate20WellH10-16S 23718 18336 18336 35 35
BMI-Plate20WellH7-16S 26738 20715 20715 76 76
BMI-Plate20WellH8-16S 35523 28328 28328 104 98

Here is a portion from my stats file for the 300bp run:
image sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
BMI-Plate10WellC10-16S 16502 0 0 0 0
BMI-Plate15WellE4-16S 13882 0 0 0 0
BMI-Plate20WellG10-16S 32631 2 2 2 2
BMI-Plate20WellH10-16S 23718 3 3 0 0
BMI-Plate20WellH7-16S 26738 1 1 0 0
BMI-Plate20WellH8-16S 35523 1 1 1 1

I ran Cutadapt outside of Qiime to trim off the primers.
Here is what my plots look like for the imported data (primers already removed using Cutadapt).

After no success using DADA2 denoise-paired I decided to merge my reads myself outside of Qiime using FLASH after removing primers using Cutadapt and then used dada2 denoise-single which resulted in much better read retention. Here is a portion from my stats file running dada2 denoise single:

image sample-id input filtered denoised non-chimeric
#q2:types numeric numeric numeric numeric
BMI-Plate10WellC10-16S 16502 8076 8076 8045
BMI-Plate15WellE4-16S 13882 6103 6103 6103
BMI-Plate20WellG10-16S 32631 16367 16367 16200
BMI-Plate20WellH10-16S 23718 12202 12202 12142
BMI-Plate20WellH7-16S 26738 14643 14643 14611
BMI-Plate20WellH8-16S 35523 20401 20401 20310

And my plot from merging the reads outside of DADA2:

My question is, what is happening when I try to run dada2 denoise-paired that is causing me to lose the majority of my reads?

Hi @minardsmitha,

I want to start with the caveat that DADA2 does not let you use pre-quality filtered data. If you want to join paired ends and denoise, my suggestion is to look at deblur.

It looks like, based on your data, that when you trim to 150 and 150, you’re failing to merge because the sequences aren’t long enough.

When you use the full length reads, all of your reads are too low quality and are filtered out.

This is pretty clear in your quality filter plots as well. I might try relaxing your parameters a little bit, maybe keep a bit longer as you’re joining. Or, like I mentioned before, if you want to do the joining before denoising, Deblur gives that option but DADA2 does not.



Thank you @jwdebelius . I see how the quality of my reads is affecting DADA2. Can you explain what DADA2 does, maybe step by step?
I am not yet understanding why I can’t join my reads outside of DADA2 and then run them as single-end reads with DADA2. I am not performing any quality filtering with my preprocessing steps aside from the quality adjustments made with the merged portion when merging the reads. They would still be fastq files so quality filtering could still be performed by DADA2.

So, the DADA2 steps are outlined pretty well in the file you have:

  1. Quality filtering
  2. Denoising
  3. (Merge)
  4. Chimera removal

The algorithm makes assumptions about your error profile and therefore single end cannot be pre-joined. If you want to work with pre-joined reads, you need to run Deblur.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.