Is trimmomatic necessary before dada2?

Dchung · August 30, 2018, 10:04am

hi,
My supervisor did some analysis of 16S samples (V3V4 regions) through DADA2 but we ran into some doubts about the filtering steps (merging & chimera removal).

BYLN3-demux.qzv (290.8 KB)

The forward and reverse reads were 300bp each so after the trim/truncating there still was more than 20 bp overlap, considering that V3 V4 regions length is ~460bp.

The merging step always threw out a lot of sequences so tried different approaches to compare the results:.

Looking at the attached demux.qzv file, several combinations of parameters were used to compare outputs:

DADA2 on original demux file with trim/trunc values given
Trimmomatic was done before DADA2(used 515/ 806 primers) so little trim/truncating values
Trimmomatic was done before DADA2 AND same trim/truncate values given as #1 (according to demux.qzv)
Trimmomatic was done before DADA2 AND trim/truncate values given as #1 AND only forward reads were used

Resulting stats

1. DADA2 on original demux file with trim/trunc values given

qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3-pe-demux.qza \
--o-table BYLN3-table1mill_4.qza \
--o-representative-sequences BYLN3-rep-seqs1mill_4.qza \
--o-denoising-stats BYLN3-denoising-stats1mill_4.qza \
--p-trim-left-f 17 \
--p-trim-left-r 20 \
--p-trunc-len-f 280 \
--p-trunc-len-r 260 \
--p-n-threads 16 \
--p-n-reads-learn 1000000

sample-id	input	filtered	denoised	merged	non-chimeric
#q2:types	numeric	numeric	numeric	numeric	numeric
AJM0303	60485	42289	42289	13901	8109
NEGATIVE	17560	12439	12439	7900	6731
YAM0907	70721	50271	50271	19523	11069
YAM2511	66559	48029	48029	16292	9635
YAM2512	53449	39393	39393	14618	8459
YAM2513	54478	39873	39873	17145	8716
YAM2514	36772	26708	26708	11418	6967
YAM2515	73333	53219	53219	19119	11146
YAM2516	39504	27992	27992	10006	5943

merging step seems harsh

2. Trimmomatic was done before DADA2(used 515/ 806 primers) so little trim/truncating values

qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3_89-pe-demux.qza \
--o-table BYLN3_89-table.qza \
--o-representative-sequences BYLN3_89-rep-seqs.qza \
--o-denoising-stats BYLN3_89-denoising-stats.qza \
--p-trim-left-f 6 \
--p-trim-left-r 6 \
--p-trunc-len-f 280 \
--p-trunc-len-r 240 \
--p-n-threads 12 \
--p-n-reads-learn 1000000

sample-id	input	filtered	denoised	merged	non-chimeric
#q2:types	numeric	numeric	numeric	numeric	numeric
AJM0303T	50295	45121	45121	15133	7692
NEGATIVET	14820	13346	13346	8824	7512
YAM0907T	59786	53555	53555	21718	11831
YAM2511T	55715	50494	50494	17619	11089
YAM2512T	45798	41500	41500	15931	9004
YAM2513T	46301	41903	41903	17548	8987
YAM2514T	30966	28059	28059	12465	8089
YAM2515T	61728	55987	55987	21750	13672
YAM2516T	32818	29611	29611	10231	6455

Considering sudden increase of chimeric sequences being filtered out, it seems that Trimmomatic + trimming 6 seqs still don't successfully remove the primers. Why would this be?

3. Trimmomatic was done before DADA2 AND trim/truncate values given

qiime dada2 denoise-paired \
--i-demultiplexed-seqs BYLN3_89-pe-demux.qza \
--o-table BYLN3_89_3-table.qza \
--o-representative-sequences BYLN3_89_3-rep-seqs.qza \
--o-denoising-stats BYLN3_89_3-denoising-stats.qza \
--p-trim-left-f 17 \
--p-trim-left-r 20 \
--p-trunc-len-f 280 \
--p-trunc-len-r 260 \
--p-n-threads 12 \
--p-n-reads-learn 1000000

sample-id	input	filtered	denoised	merged	non-chimeric
#q2:types	numeric	numeric	numeric	numeric	numeric
AJM0303T	50295	40970	40970	17537	15965
NEGATIVET	14820	12148	12148	11323	11085
YAM0907T	59786	48949	48949	23985	21305
YAM2511T	55715	46762	46762	20449	19092
YAM2512T	45798	38209	38209	17984	16831
YAM2513T	46301	38701	38701	19653	17490
YAM2514T	30966	25864	25864	14448	13813
YAM2515T	61728	51618	51618	24471	22849
YAM2516T	32818	27118	27118	12057	11576

It seems that running trimmomatic AND giving trim/trunc values according to the original (no trimmomatic) demux quality results in the best among 1,2,3. How can this be? Isn't this being redundant and filtering the primers twice?

4. performed trimmomatic before dada2 AND used just the forward reads

qiime dada2 denoise-single \
--i-demultiplexed-seqs BYLN3F-single-end-demux.qza \
--p-trim-left 17 \
--p-trunc-len 280 \
--o-representative-sequences BYLN3F_3-rep-seqs.qza \
--o-table BYLN3F_3-table.qza \
--o-denoising-stats BYLN3F_3-denoising-stats.qza \
--p-n-threads 4 --p-n-reads-learn 1000000

sample-id	input	filtered	denoised	non-chimeric
#q2:types	numeric	numeric	numeric	numeric
AJM0303fp	50295	48071	48071	46185
NEGATIVEfp	14820	14185	14185	14030
YAM0907fp	59786	57099	57099	53256
YAM2510fp	145001	139020	139020	133711
YAM2511fp	55715	53193	53193	51268
YAM2512fp	45798	43599	43599	41268
YAM2513fp	46301	44034	44034	39938
YAM2514fp	30966	29523	29523	27905
YAM2515fp	61728	59101	59101	56242

It seems like that removing primers twice (once by trimmomatic and again with DADA2 parameters) give me the most sequences after all the filtering steps. I'm thinking that this approach is what I should proceed with my further analysis. Is this right?

Remaining seqs are ranked as follows:
4 > 3 > 1 > 2

I apologize for such a messy question.
Please let me know if you need more information.

Cheers,

AhHua · August 30, 2018, 1:20pm

Hi @Dchung,

I believe, as mentioned in the DADA2 paper, dada2 , "the DADA2 pipeline performs merging of paired-end reads after denoising.", 'coz the denoising algorithm check the quality score and then estimate the error rate for forward and reverse reads separately.
You might want to check the DADA2 paper.
Hope this help.

Cheers,

YY

Nicholas_Bokulich · August 30, 2018, 1:42pm

Hi @Dchung,
No, it is not necessary to use trimmomatic prior to dada2 — otherwise we would have a trimmomatic QIIME 2 plugin! We do have a q2-cutadapt plugin, and as far as I know trimmomatic and cutadapt do many of the same things. dada2 has its own trim/trunc parameters so pre-processing is not necessary.

Thank you for sharing your dada2 stats results. None of these results look great — all have merging issues because you are truncating the reads too much. I would recommend extending your truncation parameters — even if the average length is 460, there is some variation and you are losing 50% of reads at the read joining/merging step, which is bad news. If increasing truncation values does not improve read yields, I recommend using only the forward reads.

the primers are not being trimmed twice, you are just trimming extra bases off the 5' ends of each read.

I hope that helps!

system · September 30, 2018, 7:42pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.