Cutadapt trim-paired vs. dada2 (option --p-trim-left)? Help with interpretation of cutadapt output.

jul · June 5, 2019, 3:24pm

Hi Qiime Team,
I use the qiime cutadapt trim-paired command to get rid of the primers before proceeding to Dada2 (paired-end reads, 2x300 bp, 16S, V3-V4, rawdata from MiSeq comes demultiplexed already and seems to be without adapter already). Checking the verbose output and also comparing demux.qzv and trimmed-seq.qzv, I wonder if cutadapt really worked out. I wanted to use catadapt to really cut everything of what could be a primer or adapter, I mean to be really precise (more than with dada2 --p-trim-left).
So - if I understand the output right, then in the first sample the cutting is mostly at lenght 17 (=primer length), but also in some cases at a lenght of 298 b.? Is that right nevertheless or should I re-run all my samples with the normal dada2 options with
--p-trim-left-f 17
--p-trim-left-r 21
?
Thank you for your help!
Jul

I use Qiime 2019.1 via conda, here is the command:

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-error-rate 0
--o-trimmed-sequences trimmed-seq.qza
--verbose

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed-seq.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 276
--p-trunc-len-r 260
--o-table table-dada2.qza
--o-representative-sequences rep-seqs-dada2.qza
--o-denoising-stats stats-dada2.qza
--verbose

And that is the verbose output for the first 2 samples.

CasavaOneEightSingleLanePerSampleDirFmt-ilwi2pc9/177AC_S2_L001_R1_001.fastq.gz -p /tmp/5270903.1.eve/q2-CasavaOneEightSingleLanePerSampleDirFmt-ilwi2pc9/177AC_S2_L001_R2_001.fastq.gz --front CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC /tmp/5270903.1.eve/qiime2-archive-omeangci/4c946622-fcb7-4107-b4a3-58b9cce123d3/data/177AC_S2_L001_R1_001.fastq.gz /tmp/5270903.1.eve/qiime2-archive-omeangci/4c946622-fcb7-4107-b4a3-58b9cce123d3/data/177AC_S2_L001_R2_001.fastq.gz
Processing reads on 1 core in paired-end mode ...
Finished in 12.22 s (37 us/read; 1.61 M reads/minute).

=== Summary ===

Total read pairs processed: 328,121
Read 1 with adapter: 310,507 (94.6%)
Read 2 with adapter: 308,389 (94.0%)
Pairs written (passing filters): 328,121 (100.0%)

Total basepairs processed: 196,971,346 bp
Read 1: 98,611,659 bp
Read 2: 98,359,687 bp
Total written (filtered): 185,219,682 bp (94.0%)
Read 1: 93,332,999 bp
Read 2: 91,886,683 bp

=== First read: Adapter 1 ===

Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 310507 times.

No. of allowed errors:
0-17 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 21 5126.9 0 21
6 1 80.1 0 1
7 2 20.0 0 2
9 4 1.3 0 4
10 3 0.3 0 3
11 1 0.1 0 1
12 1 0.0 0 1
13 4 0.0 0 4
14 3 0.0 0 3
15 15 0.0 0 15
16 303 0.0 0 303
17 310127 0.0 0 310127
18 16 0.0 0 16
30 1 0.0 0 1
35 1 0.0 0 1
40 1 0.0 0 1
173 1 0.0 0 1
298 2 0.0 0 2

=== Second read: Adapter 2 ===

Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 308389 times.

No. of allowed errors:
0-21 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 149 5126.9 0 149
4 2 1281.7 0 2
6 4 80.1 0 4
9 6 1.3 0 6
10 1 0.3 0 1
11 8 0.1 0 8
12 4 0.0 0 4
13 5 0.0 0 5
14 7 0.0 0 7
15 6 0.0 0 6
16 5 0.0 0 5
17 4 0.0 0 4
18 4 0.0 0 4
19 2 0.0 0 2
20 122 0.0 0 122
21 308038 0.0 0 308038
22 19 0.0 0 19
42 1 0.0 0 1
54 1 0.0 0 1
62 1 0.0 0 1

This is cutadapt 1.18 with Python 3.6.7
Command line parameters: --cores 1 --error-rate 0.0 --times 1 --overlap 3 -o /tmp/5270903.1.eve/q2-CasavaOneEightSingleLanePerSampleDirFmt-ilwi2pc9/179BL_S10_L001_R1_001.fastq.gz -p /tmp/5270903.1.eve/q2-CasavaOneEightSingleLanePerSampleDirFmt-ilwi2pc9/179BL_S10_L001_R2_001.fastq.gz --front CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC /tmp/5270903.1.eve/qiime2-archive-omeangci/4c946622-fcb7-4107-b4a3-58b9cce123d3/data/179BL_S10_L001_R1_001.fastq.gz /tmp/5270903.1.eve/qiime2-archive-omeangci/4c946622-fcb7-4107-b4a3-58b9cce123d3/data/179BL_S10_L001_R2_001.fastq.gz
Processing reads on 1 core in paired-end mode ...
Finished in 11.09 s (37 us/read; 1.61 M reads/minute).

=== Summary ===

Total read pairs processed: 296,960
Read 1 with adapter: 281,857 (94.9%)
Read 2 with adapter: 280,944 (94.6%)
Pairs written (passing filters): 296,960 (100.0%)

Total basepairs processed: 177,957,827 bp
Read 1: 88,986,088 bp
Read 2: 88,971,739 bp
Total written (filtered): 167,267,990 bp (94.0%)
Read 1: 84,195,202 bp
Read 2: 83,072,788 bp

=== First read: Adapter 1 ===

Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 281857 times.

No. of allowed errors:
0-17 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 22 4640.0 0 22
7 4 18.1 0 4
8 1 4.5 0 1
9 1 1.1 0 1
11 1 0.1 0 1
12 2 0.0 0 2
13 7 0.0 0 7
14 4 0.0 0 4
15 7 0.0 0 7
16 280 0.0 0 280
17 281512 0.0 0 281512
18 14 0.0 0 14
26 2 0.0 0 2

=== Second read: Adapter 2 ===

Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 280944 times.

No. of allowed errors:
0-21 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 33 4640.0 0 33
7 2 18.1 0 2
8 1 4.5 0 1
9 1 1.1 0 1
11 3 0.1 0 3
12 5 0.0 0 5
13 3 0.0 0 3
15 6 0.0 0 6
16 1 0.0 0 1
17 2 0.0 0 2
19 1 0.0 0 1
20 97 0.0 0 97
21 280768 0.0 0 280768
22 21 0.0 0 21

ebolyen · June 7, 2019, 4:34pm

Hi @jul!

Sorry for the slight delay on our part.

You are indeed understanding the output of cutadapt correctly. There are some things you might be interested in which can change the trimming behavior:

According to this table (--p-front-* maps to -g in q2-cutadapt), you can use a starting character ^ to anchor exactly at the beginning, or X to anchor “not-exceeding” the beginning.

That said, I suspect your primer isn’t of variable length here, so using trim-left in DADA2 is definitely much easier in this case (and it lets you avoid sequencing error slipping through!) so I would probably recommend that, just as you were thinking.

jul · June 12, 2019, 5:22pm

Thank you a lot - also for the table!!
One more question for the use of dada2 (paired end 2x300b):
Do the trimming parameter numbers stand for the position of the bases (1)? Or does the number change after cutting the primer of (2)?
example:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 18
--p-trim-left-r 22
--p-trunc-len-f 270
--p-trunc-len-r 240
--o-table table-dada2.qza

There, 1) Primer is cut at 0-18 and length is reduced after the base number 270. So 30 bases are cut of.
Or 2) primer is cut at base 0-18, base number 19 is the new first base and the new lenght for the sequence is 300-18=282. After the 270 the sequence is cut and hence, 12 bases are cut off.
In my case (1) gives and overlap of ~50 bases and (2) of ~90.
I guess (1) is right, is it?
Thank you!

ebolyen · June 25, 2019, 4:50pm

Hey @jul,

Sorry for the very late reply. Yes (1) is correct. The positions are relative to the original read, so you could imagine this as truncation happening first, and then trimming happening last.

jul · June 26, 2019, 10:42am

It was a not urgent just-to-be-sure question. Thank you.