dada2 paired / with vs without primer / merged huge loss

Hello,

I did a dada2 analysis without completely removing my primers on the sequences :
the primers have a length of 18 nt for the forward and 21 nt for the reverse.
My QIIME2 command:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 12
--p-trim-left-r 12
--p-trunc-len-f 250
--p-trunc-len-r 250
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

12 wasn't enough to completely remove the primers .

So I removed them with a cutadapt command then I preformed the dada2 step.

qiime cutadapt trim-paired
--i-demultiplexed-sequences $input_file
--p-front-f CCTAYGGGRBGCASCAG
--p-front-r GGACTACNNGGGTATCTAAT
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--o-trimmed-sequences trimmed_remove_primers_wild.qza

then

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 225
--p-trunc-len-r 225
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

I compared the 2 results by curiosity:

With the primers there is a huge loss in sequences in the non chimeric step, I expect that.
However, I don't understand why there was a huge loss at the merging step? How does the presence of the primer impact the merging step?

Thank you very much for any advise,
Jérémy Tournayre

Hello Jérémy,

How are you trunc-len settings changing in those 6 configurations? I ask because in this example,

--p-front-f CCTAYGGGRBGCASCAG
--p-front-r GGACTACNNGGGTATCTAAT
...
--p-trunc-len-f 225
--p-trunc-len-r 225

the primers being trimmed are 17 and 20 bp long, and the trunc-len is 25 bp shorter for both reads. DADA2 pairing is very sensitive to overlap length and quality in the area of overlap, and this could explain the changes you are seeing in percentage of reads that merge.

I wanted to discuss that first, before speculating about the chimera checking step.

Thanks! :whale2:

4 Likes

Hello,
I think your observation that "retaining primer sequences has profound influence on DADA-2 output" is well corroborate with mine. I experienced the same before some days. I suggest you to try various combinations of truncations, for example, 250 Forward/250 Reverse, 250F/230R, 250F/210R and 250F/190R and check if there is any improvement in the result. Kindly note that these combinations should be decided based on the quality of forward and reverse reads.

1 Like

Hi @JeremyTournayre,

You can check out this post, and the rest of the thread, for a very similar discussion:

-Mike

1 Like

Hello,

Thanks for the replies!

I should have shown the reads qualities:

I have done dada2 analysis with multiple parameters.
"With primer" = without use of cutadapt but with trim parameters like "trim 18nt on R1 and 21nt on R2" (so this is normally without primers after the trimming)
Without primer = with cutadapt (~sort of trim 18nt on R1 and 21nt on R2) then trunc 225 (so trunc of 243 on the untrimmed reads):

Results:


We can see an expected loss in the merging step between "With primers with a trunc of 250" versus "with primers with a trunc of 237". But I don't understand why without primers (with cutadapt or a trim of 18nt R1 /21nt R2) permit to have a higher merging results. I can understand why the trim impact the merging step but why the firsts nucleotides are involved in the merging step? I think mainly between these two analysis the orange "With primers : trim 12 & trunc 250" vs the black "With primers : Trim 18/21 & trunc 250", there is a loss of ~7000 reads out of ~35 500 reads, why? For information, the merging reads have a length of about 430 nt.

I have read Looking for help joining paired-end reads - #12 by SoilRotifer but the parameter trim is not involved like in my results right?

Have a good day and thanks again for the assistance.
Jérémy

Thanks for the additional info, Jérémy.

Let's zoom in to just the Orange :orange_circle: and Black options :black_circle:

If I understand your settings correctly, these are identical, except that :black_circle: has more base-pairs trimmed from the start of both R1 and R2.

Did I get that right? :face_with_monocle:

Good question! Is dada2 joining these reads from their start?! :scream_cat:

@SoilRotifer, any ideas?

1 Like

Yeah that's it. I join the 2 denoising_stat.qzv (you can see the parameters in provenance tab).
black_trim18-21_trunc250_denoising-stats (1.2 MB)
orange_trim12_trunc250_denoising-stats² (1.2 MB)

1 Like