I did a dada2 analysis without completely removing my primers on the sequences :
the primers have a length of 18 nt for the forward and 21 nt for the reverse.
My QIIME2 command:
With the primers there is a huge loss in sequences in the non chimeric step, I expect that.
However, I don't understand why there was a huge loss at the merging step? How does the presence of the primer impact the merging step?
Thank you very much for any advise,
Jérémy Tournayre
the primers being trimmed are 17 and 20 bp long, and the trunc-len is 25 bp shorter for both reads. DADA2 pairing is very sensitive to overlap length and quality in the area of overlap, and this could explain the changes you are seeing in percentage of reads that merge.
I wanted to discuss that first, before speculating about the chimera checking step.
Hello,
I think your observation that "retaining primer sequences has profound influence on DADA-2 output" is well corroborate with mine. I experienced the same before some days. I suggest you to try various combinations of truncations, for example, 250 Forward/250 Reverse, 250F/230R, 250F/210R and 250F/190R and check if there is any improvement in the result. Kindly note that these combinations should be decided based on the quality of forward and reverse reads.
I have done dada2 analysis with multiple parameters.
"With primer" = without use of cutadapt but with trim parameters like "trim 18nt on R1 and 21nt on R2" (so this is normally without primers after the trimming)
Without primer = with cutadapt (~sort of trim 18nt on R1 and 21nt on R2) then trunc 225 (so trunc of 243 on the untrimmed reads):
We can see an expected loss in the merging step between "With primers with a trunc of 250" versus "with primers with a trunc of 237". But I don't understand why without primers (with cutadapt or a trim of 18nt R1 /21nt R2) permit to have a higher merging results. I can understand why the trim impact the merging step but why the firsts nucleotides are involved in the merging step? I think mainly between these two analysis the orange "With primers : trim 12 & trunc 250" vs the black "With primers : Trim 18/21 & trunc 250", there is a loss of ~7000 reads out of ~35 500 reads, why? For information, the merging reads have a length of about 430 nt.