DADA2 : low non chimeric read counts after denoising step , --p-min-fold-parent-over-abundance

hello everyone,
I'm not sure if this question makes sense, but I'm going to ask it anyway.
I'm dealing with 2 × 300 bp Illumina MiSeq 16S rRNA data targeting the V3–V4 region.
I’d like to know how much overlap I should ensure after truncating my forward and reverse reads with the parameters shown below:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 268
--p-trunc-len-r 257
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

In fact, I'm seeing very low non-chimeric read counts after DADA2 denoising. I’ve read various posts suggesting that adjusting --p-min-fold-parent-over-abundance might help, but I’m not sure what value to use.
Does anyone have suggestions or best practices for these parameterb?

Thank you in advance for any guidance!




demux.qza : https://drive.google.com/file/d/1C-CTSdJPOvnvAN9n1FT8CSfIP8f8bv1h/view?usp=drive_link

HI @Sue, the suggested values are outlined in this post, along with references. I'd suggest not going above 16 if possible.

To help I'd also lower the truncation values too... by default DADA2 requires at least 12 bp of overlap. I'd suggest reading this post for more detail.

1 Like

Thank you very much. I will test this option.
Otherwise, which values do you suggest for these two parameters based on my graphs?

  • --p-trunc-len-f
  • --p-trunc-len-r

As suggested by last the post I linked above:

You can get an estimate of how much overlap you might have between the two reads. To do this, add the cycle counts of the paired reads and subtract your expected amplicon length:

300 + 300 - 464 = 136 bases

then subtract another 12 to make sure you account for a minimum 12 bases of overlap as required by DADA2:
136 - 12 = 124 bases

Thus, given your run you have a lot of room to spare! ~124 bases you can trim! Note this assumes no primers within the sequence output! So you'll have to subtract the length of those primers too, or run cutadapt prior visualizing the quality plots and determining your truncation values. Again, see that last post for more details.

You can probably truncate anywhere from 20 - 60 bases or so bases from each read. For example, you can start with this:

--p-trunc-len-f 250
--p-trunc-len-r 230

You may still need to play around with the values, but that should be a good place to start.

1 Like

thank you for all this information. I have tested several combinations, both with and without --p-min-fold-parent-over-abundance. Which one do you think is best? Are these values acceptable?
these are the resultat without --p-min-fold-parent-over-abundance:

--p-trunc-len-f : 230 --p-trunc-len-r : 220 , --p-trim-left-f :5 , --p-trim-left-r :5 ---> denoising 0.01%
--p-trunc-len-f : 220 --p-trunc-len-r : 220 , --p-trim-left-f :5 , --p-trim-left-r :5 ---> denosing 0
--p-trunc-len-f : 240 --p-trunc-len-r : 230 , --p-trim-left-f :5 , --p-trim-left-r :5 ---> denosing 10

--p-trunc-len-f : 260 --p-trunc-len-r : 250 , --p-trim-left-f :10, --p-trim-left-r :10 ---> denosing 35
--p-trunc-len-f : 280 --p-trunc-len-r : 270 , --p-trim-left-f :10, --p-trim-left-r :10 ---> denosing 13

with --p-min-fold-parent-over-abundance

--p-trunc-len-f : 268 --p-trunc-len-r : 257 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :8 --> denoising : 41
--p-trunc-len-f : 268 --p-trunc-len-r : 257 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :12 --> denoising : 41.32
--p-trunc-len-f : 260 --p-trunc-len-r : 250 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 43
--p-trunc-len-f : 250 --p-trunc-len-r : 230 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 45
--p-trunc-len-f : 240 --p-trunc-len-r : 240 , --p-trim-left-f :0, --p-trim-left-r :0 , --p-min-fold-parent-over-abundance :16 --> denoising : 46
--p-trunc-len-f : 260 --p-trunc-len-r : 220 , --p-trim-left-f :0, --p-trim-left-r :0 , --p-min-fold-parent-over-abundance :16 --> denoising : 48
--p-trunc-len-f : 260 --p-trunc-len-r : 220 , --p-trim-left-f :0, --p-trim-left-r :0 , --p-min-fold-parent-over-abundance :16 --> denoising : 48

--p-trunc-len-f : 250 --p-trunc-len-r : 232 , --p-trim-left-f :0, --p-trim-left-r :0 , --p-min-fold-parent-over-abundance :16 --> denoising : 45
--p-trunc-len-f : 277 --p-trunc-len-r : 210 , --p-trim-left-f :0, --p-trim-left-r :0 , --p-min-fold-parent-over-abundance :16 --> denoising : 50
--p-trunc-len-f : 280 --p-trunc-len-r : 240 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 44
--p-trunc-len-f : 267 --p-trunc-len-r : 257 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 41
--p-trunc-len-f : 280 --p-trunc-len-r : 220 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 46
--p-trunc-len-f : 280 --p-trunc-len-r : 194 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 49
--p-trunc-len-f : 280 --p-trunc-len-r : 250 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 43
--p-trunc-len-f : 270 --p-trunc-len-r : 210 , --p-trim-left-f :5, --p-trim-left-r :5 , --p-min-fold-parent-over-abundance :16 --> denoising : 49

1 Like

Hi Sue,

Can you clarify what these number mean? Is this the percent of reads that are non-chimeric after running DADA2 or the number of reads that pass the denoising step?

(I'm a big fan of running DADA2 with multiple trunc-len setting to see what works best!)

1 Like

Hello again,
Here are the percentages of non-chimeric inputs (the last column in the first figure in my post). To simplify the display, I’ve only provided the value for the first sample for each combination rather than showing the results for all samples. could we consider that these % of non-chimeric inputs (between 40 and 50 ) are acceptable since I have tested several combinations :confused: or there is another parameter I can add to increase this %

thank you,

1 Like

Yes, if that's the best value you can get, than it's okay.

Like, if there are chimeric sequences, then it's good to remove them! :broom:

2 Likes

thank you for your help !!

Sorry to bother you again. Since I’m a beginner in this field, I contacted the platform that generated the fastq.gz files to clarify whether they still contained adapters or primers. Their response was that adapters have been removed, but primers have not. They also provided the following primer sequences (excluding adapter sequences):

  • 16S_V3‐341F_NXT V3‐V4: CCTACGGGNGGCWGCAG
  • 16S_V4‐785R_NXT V3‐V4: GACTACHVGGGTATCTAATCC

Given this information, should I modify my QIIME 2 command to remove the remaining primers? For example, is there an additional parameter I should include in the qiime dada2 denoise-paired command? Here’s the command I’m currently using:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 268
--p-trunc-len-r 257
--p-trim-left-f 18
--p-trim-left-r 18
--p-min-fold-parent-over-abundance 16
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

thank you for your time !!

or I should eliminate the bases that are not good and that appear in quality check step : 5 bases in this example :

Good morning,

Thanks for bringing another good question!

Here's the theory:
The 16S primer is complementary to the 16S gene, so it should be okay to keep.
The linker, padder, and barcode are not biological, so they need to be removed.

Here's the practice:

  • The EMP sequencing protocol uses the PCR primers again during sequencing, and all the linkers and padders are removed biochemically. No bioinformatics needed!
  • If a barcode is added, it will cause all samples to have unique ASVs. Having each sample have 100% unique microbiome will cause lots of issues downstream.
  • Lingering primer can also cause other issues, including with Chimera checking in DADA2. From the DADA2 FAQ

    The most common reason that far too many reads are flagged as chimeric is that primer sequences were not removed prior to starting the dada2 workflow. The ambiguous nucleotides in universal primer sequences will be interpreted as real variation by the dada2 pipeline, and this interferes with the chimera algorithm. In most cases, if you see over 25% of your sequencing reads being flagged chimeric, remove the primers from your reads and restart the workflow with the clean data.


You could like this:

--p-trim-left-f 5

Or just use the cutadapt plugin! It's pretty fast too! cutadapt — QIIME 2 2024.10.1 documentation

1 Like

thank you very much !!