Is trimmomatic necessary before dada2?

hi,
My supervisor did some analysis of 16S samples (V3V4 regions) through DADA2 but we ran into some doubts about the filtering steps (merging & chimera removal).

BYLN3-demux.qzv (290.8 KB)

The forward and reverse reads were 300bp each so after the trim/truncating there still was more than 20 bp overlap, considering that V3 V4 regions length is ~460bp.

The merging step always threw out a lot of sequences so tried different approaches to compare the results:.

Looking at the attached demux.qzv file, several combinations of parameters were used to compare outputs:

  1. DADA2 on original demux file with trim/trunc values given

  2. Trimmomatic was done before DADA2(used 515/ 806 primers) so little trim/truncating values

  3. Trimmomatic was done before DADA2 AND same trim/truncate values given as #1 (according to demux.qzv)

  4. Trimmomatic was done before DADA2 AND trim/truncate values given as #1 AND only forward reads were used


Resulting stats

1. DADA2 on original demux file with trim/trunc values given

qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3-pe-demux.qza \
--o-table BYLN3-table1mill_4.qza \
--o-representative-sequences BYLN3-rep-seqs1mill_4.qza \
--o-denoising-stats BYLN3-denoising-stats1mill_4.qza \
--p-trim-left-f 17 \
--p-trim-left-r 20 \
--p-trunc-len-f 280 \
--p-trunc-len-r 260 \
--p-n-threads 16 \
--p-n-reads-learn 1000000
sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
AJM0303 60485 42289 42289 13901 8109
NEGATIVE 17560 12439 12439 7900 6731
YAM0907 70721 50271 50271 19523 11069
YAM2511 66559 48029 48029 16292 9635
YAM2512 53449 39393 39393 14618 8459
YAM2513 54478 39873 39873 17145 8716
YAM2514 36772 26708 26708 11418 6967
YAM2515 73333 53219 53219 19119 11146
YAM2516 39504 27992 27992 10006 5943

merging step seems harsh




2. Trimmomatic was done before DADA2(used 515/ 806 primers) so little trim/truncating values

qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3_89-pe-demux.qza \
--o-table BYLN3_89-table.qza \
--o-representative-sequences BYLN3_89-rep-seqs.qza \
--o-denoising-stats BYLN3_89-denoising-stats.qza \
--p-trim-left-f 6 \
--p-trim-left-r 6 \
--p-trunc-len-f 280 \
--p-trunc-len-r 240 \
--p-n-threads 12 \
--p-n-reads-learn 1000000
sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
AJM0303T 50295 45121 45121 15133 7692
NEGATIVET 14820 13346 13346 8824 7512
YAM0907T 59786 53555 53555 21718 11831
YAM2511T 55715 50494 50494 17619 11089
YAM2512T 45798 41500 41500 15931 9004
YAM2513T 46301 41903 41903 17548 8987
YAM2514T 30966 28059 28059 12465 8089
YAM2515T 61728 55987 55987 21750 13672
YAM2516T 32818 29611 29611 10231 6455

Considering sudden increase of chimeric sequences being filtered out, it seems that Trimmomatic + trimming 6 seqs still don't successfully remove the primers. Why would this be?




3. Trimmomatic was done before DADA2 AND trim/truncate values given

qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3_89-pe-demux.qza \
--o-table BYLN3_89_3-table.qza \
--o-representative-sequences BYLN3_89_3-rep-seqs.qza \
--o-denoising-stats BYLN3_89_3-denoising-stats.qza \
--p-trim-left-f 17 \
--p-trim-left-r 20 \
--p-trunc-len-f 280 \
--p-trunc-len-r 260 \
--p-n-threads 12 \
--p-n-reads-learn 1000000
sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
AJM0303T 50295 40970 40970 17537 15965
NEGATIVET 14820 12148 12148 11323 11085
YAM0907T 59786 48949 48949 23985 21305
YAM2511T 55715 46762 46762 20449 19092
YAM2512T 45798 38209 38209 17984 16831
YAM2513T 46301 38701 38701 19653 17490
YAM2514T 30966 25864 25864 14448 13813
YAM2515T 61728 51618 51618 24471 22849
YAM2516T 32818 27118 27118 12057 11576

It seems that running trimmomatic AND giving trim/trunc values according to the original (no trimmomatic) demux quality results in the best among 1,2,3. How can this be? Isn't this being redundant and filtering the primers twice?


4. performed trimmomatic before dada2 AND used just the forward reads

qiime dada2 denoise-single \
--i-demultiplexed-seqs BYLN3F-single-end-demux.qza \
--p-trim-left 17 \
--p-trunc-len 280 \
--o-representative-sequences BYLN3F_3-rep-seqs.qza \
--o-table BYLN3F_3-table.qza \
--o-denoising-stats BYLN3F_3-denoising-stats.qza \
--p-n-threads 4 --p-n-reads-learn 1000000
sample-id input filtered denoised non-chimeric
#q2:types numeric numeric numeric numeric
AJM0303fp 50295 48071 48071 46185
NEGATIVEfp 14820 14185 14185 14030
YAM0907fp 59786 57099 57099 53256
YAM2510fp 145001 139020 139020 133711
YAM2511fp 55715 53193 53193 51268
YAM2512fp 45798 43599 43599 41268
YAM2513fp 46301 44034 44034 39938
YAM2514fp 30966 29523 29523 27905
YAM2515fp 61728 59101 59101 56242

It seems like that removing primers twice (once by trimmomatic and again with DADA2 parameters) give me the most sequences after all the filtering steps. I'm thinking that this approach is what I should proceed with my further analysis. Is this right?


Remaining seqs are ranked as follows:
4 > 3 > 1 > 2

I apologize for such a messy question.
Please let me know if you need more information.

Cheers,

Hi @Dchung,

I believe, as mentioned in the DADA2 paper, dada2 , “the DADA2 pipeline performs merging of paired-end reads after denoising.”, 'coz the denoising algorithm check the quality score and then estimate the error rate for forward and reverse reads separately.
You might want to check the DADA2 paper.
Hope this help.

Cheers,

YY

Hi @Dchung,
No, it is not necessary to use trimmomatic prior to dada2 — otherwise we would have a trimmomatic QIIME 2 plugin! We do have a q2-cutadapt plugin, and as far as I know trimmomatic and cutadapt do many of the same things. dada2 has its own trim/trunc parameters so pre-processing is not necessary.

Thank you for sharing your dada2 stats results. None of these results look great — all have merging issues because you are truncating the reads too much. I would recommend extending your truncation parameters — even if the average length is 460, there is some variation and you are losing 50% of reads at the read joining/merging step, which is bad news. If increasing truncation values does not improve read yields, I recommend using only the forward reads.

the primers are not being trimmed twice, you are just trimming extra bases off the 5' ends of each read.

I hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.