Cutadapt is removing trimmed sequences

Hi, I’m new to q2 and am trying out the cutadapt plugin to remove my primers from demultiplexed sequences. I am using the following script.

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux-paired-end.qza \
--p-front-f GTGARTCATCGAATCTTTG \
--p-front-r TCCTCCGCTTATTGATATGC \
--p-cores 2 \
--o-trimmed-sequences trimmed-seqs.qza \
--verbose

The output looks good (I am using two primer pairs so 63% is expected).

=== Summary ===

Total read pairs processed:             34,523
  Read 1 with adapter:                  22,021 (63.8%)
  Read 2 with adapter:                  22,037 (63.8%)
Pairs written (passing filters):        34,523 (100.0%)

Total basepairs processed:    18,739,351 bp
  Read 1:     9,671,890 bp
  Read 2:     9,067,461 bp
Total written (filtered):     17,880,663 bp (95.4%)
  Read 1:     9,253,783 bp
  Read 2:     8,626,880 bp

=== First read: Adapter 1 ===

Sequence: GTGARTCATCGAATCTTTG; Type: regular 5'; Length: 19; Trimmed: 22021 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
3	5	539.4	0	5
4	5	134.9	0	5
5	1	33.7	0	1
10	1	0.0	1	1
12	1	0.0	1	0 1
14	1	0.0	1	1
15	3	0.0	1	2 1
16	1	0.0	1	1
17	5	0.0	1	2 3
18	123	0.0	1	19 104
19	21829	0.0	1	21683 146
20	46	0.0	1	24 22

=== Second read: Adapter 2 ===

Sequence: TCCTCCGCTTATTGATATGC; Type: regular 5'; Length: 20; Trimmed: 22037 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
18	4	0.0	1	1 1 2
19	174	0.0	1	111 63
20	21836	0.0	2	21641 191 4
21	23	0.0	2	2 18 3

However, when I use the output in dada2 using the following and look at the visualisation of rep-seqs-dada2-right.qza I have a list of OTUs that do not contain the primers (my other primer set) and all the trimmed ones (correct primers) are not there…

qiime dada2 denoise-paired  --i-demultiplexed-seqs trimmed-seqs-right.qza
  --p-trunc-len-f 285
 --p-trunc-len-r 260
  --p-trim-left-f 0
  --p-trim-left-r 0 
 --o-representative-sequences rep-seqs-dada2-right.qza
  --o-table table-dada2-right.qza
  --o-denoising-stats stats-dada2-right.qza
 --verbose
  --p-n-threads 0

Ideally I would like to keep the trimmed sequences and remove the untrimmed. Any help would be really appreciated.

Hey there @SarahH - have you had a chance to review the cutadapt docs? We don’t develop cutadapt, so we might not be the best source for help.

The default behavior for this plugin is to keep the trimmed sequences, so if you are seeing something other than that, that suggests that either your trimming flags are wrong (5’ vs 3’), or, you need to anchor the primer. Also, you have the default error tolerance on, which increases the number of matches, usually. Can you verify that your primers are on the 5’ end of both the forward and the reverse?

Thanks for your reply.

The primers are fungal primers:

fITS7 5’-GTGARTCATCGAATCTTTG-3’ and ITS4 5’-TCCTCCGCTTATTGATATGC-3’.

Does this seem right?

I tried switching --p-front-f and --p-front-r but I got no hits.

What do you mean by needing to anchor the primer?

Many thanks!

I'm not sure! Is the primer on the 5' end or the 3' end? It isn't clear to me from that illustration what is the case.

Check out those docs I linked to above, in particular, this section:

https://cutadapt.readthedocs.io/en/v1.17/guide.html#anchored-5-adapters

:heart: :t_rex: :qiime2:

Hi! Thanks for your quick response again. My primers are at the 5’ end of my sequences. I checked out the docs and do not need anchored adaptors, just regular ones.

I tried the equivalent in cutadapt:

cutadapt -g GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC -o out.1.fastq -p out.2.fastq R1.fastq R2.fastq

This worked fine. So I wonder what I’m doing wrong in the q2 plugin. The output whilst running looks perfect, but when I view the rep-seqs-dada2.qza the trimmed sequences are completely gone and I’m just left with the untrimmed ones. I will use cutadapt outside of q2 for now but if you do know what I’m doing wrong that would be great!

Many thanks

1 Like

No problem!

Please re-run the command you posted above and copy-and-paste the complete log, from start to end.

Is it at all possible that you mixed up your input files to q2-dada2? I ask because in your cutadapt command your file is named trimmed-seqs.qza, but in the denoise-paired command, trimmed-seqs-right.qza. You can double-check which file is which by loading at view.qiime2.org and clicking on “Provenance”.

Thanks!

Hi! I ran it all through again and the same has happened… Here is my log of everything I have done:

(qiime2-2018.8) MAC19623:Qiime2_test lfslal$ qiime cutadapt trim-paired --i-demultiplexed-sequences demux-paired-end.qza --p-front-f GTGARTCATCGAATCTTTG --p-front-r TCCTCCGCTTATTGATATGC --p-cores 2 --o-trimmed-sequences trimmed-seqs.qza  --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The commands to be run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: cutadapt --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-E6-060718_S342_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-E6-060718_S342_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-E6-060718_S342_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-E6-060718_S342_L001_R2_001.fastq.gz

This is cutadapt 1.17 with Python 3.5.5
Command line parameters: --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-E6-060718_S342_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-E6-060718_S342_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-E6-060718_S342_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-E6-060718_S342_L001_R2_001.fastq.gz
Running on 2 cores
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 1.54 s (44 us/read; 1.37 M reads/minute).

=== Summary ===

Total read pairs processed:             35,045
  Read 1 with adapter:                  21,782 (62.2%)
  Read 2 with adapter:                  21,791 (62.2%)
Pairs written (passing filters):        35,045 (100.0%)

Total basepairs processed:    19,242,242 bp
  Read 1:     9,908,914 bp
  Read 2:     9,333,328 bp
Total written (filtered):     18,392,920 bp (95.6%)
  Read 1:     9,495,269 bp
  Read 2:     8,897,651 bp

=== First read: Adapter 1 ===

Sequence: GTGARTCATCGAATCTTTG; Type: regular 5'; Length: 19; Trimmed: 21782 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
3	2	547.6	0	2
4	5	136.9	0	5
14	1	0.0	1	1
15	1	0.0	1	1
16	1	0.0	1	0 1
17	4	0.0	1	3 1
18	121	0.0	1	16 105
19	21612	0.0	1	21476 136
20	35	0.0	1	14 21

=== Second read: Adapter 2 ===

Sequence: TCCTCCGCTTATTGATATGC; Type: regular 5'; Length: 20; Trimmed: 21791 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
18	5	0.0	1	2 1 2
19	168	0.0	1	100 67 1
20	21583	0.0	2	21390 181 12
21	35	0.0	2	3 32


Command: cutadapt --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-F6-060718_S354_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-F6-060718_S354_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-F6-060718_S354_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-F6-060718_S354_L001_R2_001.fastq.gz

This is cutadapt 1.17 with Python 3.5.5
Command line parameters: --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-F6-060718_S354_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-F6-060718_S354_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-F6-060718_S354_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-F6-060718_S354_L001_R2_001.fastq.gz
Running on 2 cores
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 1.72 s (43 us/read; 1.40 M reads/minute).

=== Summary ===

Total read pairs processed:             40,104
  Read 1 with adapter:                  23,937 (59.7%)
  Read 2 with adapter:                  23,944 (59.7%)
Pairs written (passing filters):        40,104 (100.0%)

Total basepairs processed:    21,943,129 bp
  Read 1:    11,336,339 bp
  Read 2:    10,606,790 bp
Total written (filtered):     21,009,900 bp (95.7%)
  Read 1:    10,881,787 bp
  Read 2:    10,128,113 bp

=== First read: Adapter 1 ===

Sequence: GTGARTCATCGAATCTTTG; Type: regular 5'; Length: 19; Trimmed: 23937 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
3	9	626.6	0	9
4	3	156.7	0	3
12	1	0.0	1	0 1
14	1	0.0	1	0 1
16	1	0.0	1	0 1
17	2	0.0	1	1 1
18	139	0.0	1	23 116
19	23738	0.0	1	23589 149
20	42	0.0	1	22 20
73	1	0.0	1	1

=== Second read: Adapter 2 ===

Sequence: TCCTCCGCTTATTGATATGC; Type: regular 5'; Length: 20; Trimmed: 23944 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
17	1	0.0	1	1
18	4	0.0	1	0 0 4
19	215	0.0	1	139 72 4
20	23701	0.0	2	23477 214 10
21	23	0.0	2	1 22


Command: cutadapt --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-G6-060718_S366_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-G6-060718_S366_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-G6-060718_S366_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-G6-060718_S366_L001_R2_001.fastq.gz

This is cutadapt 1.17 with Python 3.5.5
Command line parameters: --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-G6-060718_S366_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-G6-060718_S366_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-G6-060718_S366_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-G6-060718_S366_L001_R2_001.fastq.gz
Running on 2 cores
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 1.52 s (43 us/read; 1.41 M reads/minute).

=== Summary ===

Total read pairs processed:             35,743
  Read 1 with adapter:                  21,398 (59.9%)
  Read 2 with adapter:                  21,408 (59.9%)
Pairs written (passing filters):        35,743 (100.0%)

Total basepairs processed:    19,696,528 bp
  Read 1:    10,137,321 bp
  Read 2:     9,559,207 bp
Total written (filtered):     18,862,243 bp (95.8%)
  Read 1:     9,731,027 bp
  Read 2:     9,131,216 bp

=== First read: Adapter 1 ===

Sequence: GTGARTCATCGAATCTTTG; Type: regular 5'; Length: 19; Trimmed: 21398 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
3	4	558.5	0	4
4	7	139.6	0	7
15	1	0.0	1	1
16	4	0.0	1	2 2
17	6	0.0	1	2 4
18	112	0.0	1	21 91
19	21223	0.0	1	21059 164
20	41	0.0	1	17 24

=== Second read: Adapter 2 ===

Sequence: TCCTCCGCTTATTGATATGC; Type: regular 5'; Length: 20; Trimmed: 21408 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
17	1	0.0	1	1
18	7	0.0	1	2 0 5
19	181	0.0	1	108 70 3
20	21192	0.0	2	20984 198 10
21	25	0.0	2	2 23
22	2	0.0	2	0 0 2


Command: cutadapt --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-H6-060718_S378_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-H6-060718_S378_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-H6-060718_S378_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-H6-060718_S378_L001_R2_001.fastq.gz

This is cutadapt 1.17 with Python 3.5.5
Command line parameters: --cores 2 --error-rate 0.1 --times 1 --overlap 3 -o /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-H6-060718_S378_L001_R1_001.fastq.gz -p /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-7__owxsu/P4-H6-060718_S378_L001_R2_001.fastq.gz --front GTGARTCATCGAATCTTTG -G TCCTCCGCTTATTGATATGC /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-H6-060718_S378_L001_R1_001.fastq.gz /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/qiime2-archive-5320vqdd/2a6ef308-e419-48d2-b129-c11ceec6f9e8/data/P4-H6-060718_S378_L001_R2_001.fastq.gz
Running on 2 cores
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 1.47 s (43 us/read; 1.41 M reads/minute).

=== Summary ===

Total read pairs processed:             34,523
  Read 1 with adapter:                  22,021 (63.8%)
  Read 2 with adapter:                  22,037 (63.8%)
Pairs written (passing filters):        34,523 (100.0%)

Total basepairs processed:    18,739,351 bp
  Read 1:     9,671,890 bp
  Read 2:     9,067,461 bp
Total written (filtered):     17,880,663 bp (95.4%)
  Read 1:     9,253,783 bp
  Read 2:     8,626,880 bp

=== First read: Adapter 1 ===

Sequence: GTGARTCATCGAATCTTTG; Type: regular 5'; Length: 19; Trimmed: 22021 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
3	5	539.4	0	5
4	5	134.9	0	5
5	1	33.7	0	1
10	1	0.0	1	1
12	1	0.0	1	0 1
14	1	0.0	1	1
15	3	0.0	1	2 1
16	1	0.0	1	1
17	5	0.0	1	2 3
18	123	0.0	1	19 104
19	21829	0.0	1	21683 146
20	46	0.0	1	24 22

=== Second read: Adapter 2 ===

Sequence: TCCTCCGCTTATTGATATGC; Type: regular 5'; Length: 20; Trimmed: 22037 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
18	4	0.0	1	1 1 2
19	174	0.0	1	111 63
20	21836	0.0	2	21641 191 4
21	23	0.0	2	2 18 3

Saved SampleData[PairedEndSequencesWithQuality] to: trimmed-seqs.qza


(qiime2-2018.8) MAC19623:Qiime2_test lfslal$  qiime dada2 denoise-paired  --i-demultiplexed-seqs trimmed-seqs.qza  --p-trunc-len-f 285  --p-trunc-len-r 260  --p-trim-left-f 0  --p-trim-left-r 0  --o-representative-sequences rep-seqs-dada2.qza  --o-table table-dada2.qza  --o-denoising-stats stats-dada2.qza --verbose  --p-n-threads 0
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/tmpzsh0j955/forward /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/tmpzsh0j955/reverse /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/tmpzsh0j955/output.tsv.biom /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/tmpzsh0j955/track.tsv /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/tmpzsh0j955/filt_f /var/folders/6_/_9q9_yg51gsdmj64pdzts83ms996x5/T/tmpzsh0j955/filt_r 285 260 0 0 2.0 2 consensus 1.0 0 1000000

R version 3.4.1 (2017-06-30) 
Loading required package: Rcpp
DADA2 R package version: 1.6.0 
1) Filtering ....
2) Learning Error Rates
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 11103 reads in 1639 unique sequences.
Sample 2 - 13402 reads in 1895 unique sequences.
Sample 3 - 12214 reads in 2639 unique sequences.
Sample 4 - 10147 reads in 1921 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 
   selfConsist step 5 
Convergence after  5  rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 11103 reads in 1142 unique sequences.
Sample 2 - 13402 reads in 1643 unique sequences.
Sample 3 - 12214 reads in 1729 unique sequences.
Sample 4 - 10147 reads in 1270 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 
   selfConsist step 5 
Convergence after  5  rounds.

3) Denoise remaining samples 
The sequences being tabled vary in length.
4) Remove chimeras (method = consensus)
6) Write output
Saved FeatureTable[Frequency] to: table-dada2.qza
Saved FeatureData[Sequence] to: rep-seqs-dada2.qza
Saved SampleData[DADA2Stats] to: stats-dada2.qza

(qiime2-2018.8) MAC19623:Qiime2_test lfslal$ qiime feature-table tabulate-seqs   --i-data rep-seqs-dada2.qza --o-visualization rep-seqs-dada2.qzv
Saved Visualization to: rep-seqs-dada2.qzv
(qiime2-2018.8) MAC19623:Qiime2_test lfslal$ 

Thanks!

1 Like

Thanks @SarahH!

From what I can tell, both runs (q2-cutadapt and vanilla cutadapt) are identical…

So I think in order to debug this a bit more, I need some help understanding what this means:

I think I misread it initially, but it sounds to me like you are saying that you have ASVs that don’t contain your primers - isn’t that the point of trimming the primers out? I don’t follow the second part… Sorry, I need some help getting there. So, specifically, can you give an example or two of some reads pre-filtered, then, an example or two of what you are seeing post filter, and post DADA2. Thanks!

Hi! Yes I can do that. Can I ask though if there is a way of seeing the actual sequences post-trimming before the DADA2 step? I have only managed it by looking at visualisation of rep-seqs-dada2.qza after the DADA2 step. Normally you can just take a quick look at your output fastq and check all looks good. It would be nice to be able to see that this step had worked before continuing.

Thanks!

Try:
$ qiime tools extract --output-dir . trimmed-seqs.qza
Then, enter the directory automatically created by the command and find a “dna-sequences.fasta” file in data/
There you can find all the sequences.
Also, this command works for every .qza file, such as rep-seqs.qza after DADA2 etc

Wow thanks! That is so useful. I used --input-path and it worked. I looked at the fastq that came out of the cutadapt and they have been trimmed! But they are getting removed during the DADA step…leaving just the other amplicon that wasn’t trimmed behind. I’ll try and nut it out in case I’m doing something silly and get back to you.

I am essentially using the same barcodes for two amplicons to maximise samples on a run. I then usually use usearch to split them using the primer sequences before continuing, I also trim this off in the same step. I thought I’d try this on q2. The cutadapt looks like it works now (although the --discard-untrimmed option would be needed for my pipeline). I’ll try and work out what is happening during DADA2.

Many thanks both of you!

It sounds like your trimmed PE reads do not overlap. After trimming with cutadapt, please run qiime demux summarize and examine the quality and length-distribution results. Please share those here as well.

Hi again!

This is the first few lines of one of the R1 output fastq from cutadapt, showing correctly trimmed sequences starting ‘AACGCAC’ and also my other amplicon starting ‘GTTGAATTTT’. This is all looking as expected.

@M02610:40:000000000-BJD3H:1:1101:19050:1559 1:N:0:ATGCGCAG+CTATTAAG
AACGCACATTGCGCCCGCTAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCCTTTGGACCCTCGGGCCCGGGTGACTTTATCCCCTGCGGCCCGGTGTTGGGGACCCGCGGCTGCCGCGGCCCCCTAAAGACAGTGGCGGACTTCTTTACAGCCCTGGCGCAGTAGCACATCTCGCCTTCGGTTATAGAGTATGCCTTGCCCCCAGCACCATACTTACAAAGCTGGACCTCGAATCAGGTAGGAATACCCGCCGAACTTAAGCATATCAATAAGCGGAGGA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGFGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFCF?FFFFFF)77<91ABFFG19=F<F57?4AF>;9>F
@M02610:40:000000000-BJD3H:1:1101:20125:1797 1:N:0:ATGCGCAG+CTATTAAG
GTTGAATTTTAGCCCTGGCTGGGCGGTCCAGCCTTTTTGGGTTGGTACTGCTTTGGCTGGGGTTCACCTTCTGGTAAGCCGGCATGCTCTTAATTGGGTGTGTCGGGGACCCAGGACTTTTACTTTGAAAAAATTAGAGTGTTTAAAGCAGGCCTACGCTTGAATACATTAGCATGGAATAATAGAATAGGACTTTGGTTCTATTTTGTTGGTTTCTAGGACCGAGGTAATGATTAATAGGGATAGTTGGGGGCATTAGTATTTAATTGTCAGAGGTGAAATTCTTGGATTTATGAAAGA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGFGGGFGGGGGGGGGGEFFGGGGFGG[email protected]FGEFFGGGGGGGFG8CEFFFGGGGGFFGGGFGGGGGGGGGGGGGGGGGGGGGGFFGGG:[email protected]=CG=EDGGGGGGGGGG=DFFC9>[email protected]?181?FBGBFBFF2;AFF<AAA2<<7
@M02610:40:000000000-BJD3H:1:1101:8802:1848 1:N:0:ATGCGCAG+CTATTAAG
AACGCACATTGCGCCCGCGAGTATTCTCGCGGGCATGCCTGTTCGAGCGTCATTTCAACCATCAAGCCCCCGGCTTGTGTTGGGGACCTGCGGCTGCCCGCAGGCCCTGAAATCCAGTGGCGGGCTCGCTGTCGCACCGAGCGTAGTAGCATACTTCTCGCTCCGGGAGCGCGGCGGGCGCCTGCCGTAAAACCCCCTACATACCAAAGGTTGACCTCGGATCAGGTAGGAATACCCGCTGAAC
+
GGGFGEGGGGGGGGGGCGECGGGGGGFCFGGGGGGGGGGGGGGGGFGGGGGGGGGGG?FFGGGGGFGGEGGG7BBG745FFEGDCFGCEFCE:[email protected]:@:[email protected]**C<<CB:<FE:1
<CFFGEF7E7<9++AFF?CGD*:;CC58*?>8E188EGGGG8>FG8EGFFGGGG:5E6<9<8?+;CF7:09CFC<C):55DGF07/8/976;CEG)79:
@M02610:40:000000000-BJD3H:1:1101:13294:1870 1:N:0:ATGCGCAG+CTATTAAG
GTTGAATTTTAGCCCTGGCTGGGCGGTCCAGCCTTTTTGGGTTGGTACTGCTTTGGCTGGGGTTCACCTTCTGGTAAGCCGGCATGCTCTTAATTGGGTGTGTCGGGGAACCAGGACTTTTACTTTGAAAAAATTAGAGTGTTTAAAGCAGGCCTACGCTTGAATACATTAGCATGGAATAATAGAATAGGACTTTGGTTCTATTTTGTTGGTTTCTAGGACCGAGGTAATGATTAATAGGGATAGTTGGGGGCATTAGTATTTAATTGTCAGAGGTGAAATTCTTGGATTTATGAAAGA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDFFGGGGGGGGGGGGGGGGGFDFGGGGGEGGDEGGGDFGFGGGGGGGFGGGGGGGGF>:[email protected]@FBFFFFFFF?F=?FFFFBB>64<)8

I then ran my trimmed-seqs.qza (from which the above was extracted).

qiime dada2 denoise-paired
–i-demultiplexed-seqs trimmed-seqs.qza
–p-trunc-len-f 285
–p-trunc-len-r 260
–p-trim-left-f 0
–p-trim-left-r 0
–o-representative-sequences rep-seqs-dada2.qza
–o-table table-dada2.qza
–o-denoising-stats stats-dada2.qza
–verbose
–p-n-threads 0

I then extracted rep-seqs-dada2.qza to look at the fasta and got the below which is just the first few lines but all sequences began with ‘GTTGAATTTT’ and my sequences beginning with ‘AACGCAC’ (the ones I trimmed and am interested in) are gone… why would they disappear? My other amplicon joins without any problems. I think I’ll try splitting them in usearch and just putting these ones through and see if it is ok with them when the other amplicon isn’t there, although I’ve got no reason why this would be the case. Does anyone have any ideas? Do my DADA2 settings look ok? Thanks so much!

f8c2227c88f28c7e31606f6d90fb3ef0
GTTGAATTTTAGCGCTGGTCCGGGTGGTCCTGCCTTCCGGTAGGCACTGTCTTTGGCTGGCGTTCAACCTTCTGGTGAACTGGTCTGCACCTCGTGTGTGGGCCAGGGAACCAGGACTTTTTACTTTGAAAAAATTAGAGTGTTTAAAGCAGGCAAGTTTTTGCTCGAATACATTAGCATGGAATAATAGAATAGGACTTTGGTTCTATTTTGTTGGTTTCTAGGACCGAAGTAATGATTAATAGGGATAGTTGGGGGCATTAGTATTTAATTGTCAGAGGTGAAATTCTTGGATTTATGAAAGACTAACTTCTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTAACCATAAACTATGCCGACTAGGGATCGGACGATGTTATTTACTGACTCGTTCGGCACCTTATGAGAAATCAAAGTTTTTGGG
3164e9e9bf8dae303e7947b6d41a9027
GTTGAATTTTAGCCCTGGCCGAGCGGTCCACTATTTAGTCGGTACTGCTCCGGTCTCGGCTCACCTTCTGGCGAGCCATCATGTCCTTCACTGGATGTGGTGGGGAACCAGGACTTTTACTTTGAAAAAATTAGAGTGTTTAAAGCAGGCATTTGCTTGAATAGATTAGCATGGAATAATGGAATAGGACCTTGGTTCTATTTTGTTGGTTTCTAGGACCTCGGTAATGATTAATAGGGATAGTTGGGGGCATTAGTATTTAGTAGCTAGAGGTGAAATTCTTGGATTTACGAAAGACTAACTTCTGCGAAAGCATTTGTCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTAACCATAAACTATGCCGACTAGGGATCAGACTGTCGTTAATTTACTGACGCGTTTGGCACCTTATGAGAAATCAAAGTTTTTGGG
091a408c12be5a59fd543edb46bec00f
GTTGAATTTTAGCCCTGGCTGGGCGGTCCACCTTTGGATGGTACTGCTCTGGCTGGGGCTTACCTTTCTGGCGAGCCGCCATGCCCTTCGCTGGGTGTGGCGGGGAACCAGGACTTTTACCTTGAAAAAATTAGAGTGTTTAAAGCAGGCCTATGCTTGAATACATTAGCATGGAATAATAGAATAGGACTTTGGTTCTATTTTGTTGGTTTCTAGGACCGAGGTAATGATTAATAGGGATAGTTGGGGGCATTAGTATTTAATTGTCAGAGGTGAAATTCTTGGATTTATGAAAGACTAACTTCTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTAACCATAAACTATGCCGACTAGGGATCGGACGATGTTATTTACTGACTCGTTCGGCACCTTATGAGAAATCAAAGTTTTTGGG
87c2648be36eeebee639aef5d486e709
GTTGAATTTTAGCCTTGGCTGGGCGGTCCACCTTTGGATGGTACTGCTCTGGCTGGGGTTTACCTTTCTGGTGAGCCGTCATGCCCTTTGCTGGGTGTGGCGGGGAACCAGGACTTTTACCTTGAAAAAATTAGAGTGTTTAAAGCAGGCCAATTGCTTGAATACATTAGCATGGAATAATGGAATAGGACCTCGGTTCTATTTTGTTGGTTTCTAGGACCGACGTAATGATTAATAGGAATAGTTGGGGGCATTAGTATTTAGTAGCTAGAGGTGAAATTCTTGGATTTACGAAAGACTAACTTCTGCGAAAGCATTTGTCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTAACCATAAACTATGCCGACTAGGGATCAGACTGTCGTTAATTTACTGACGCGTTTGGCACCTTATGAGAAATCAAAGTTTTTGGG

trimmed-seqs.qzv (295.4 KB)

Hi! Thanks for your help. I’ve attached the trimmed-seqs.qzv. My trimmed PE do overlap as I have run these through a different pipeline.

However, I have done the opposite now, so trimmed my other amplicon primers, checked that has worked (my trimmed sequences are here along with my untrimmed other amplicon), then run DADA2 and again the sequences that have been trimmed disappear! I’m left with the untrimmed ones (the ones that disappeared the first time I did it).

I ran the DADA2 on my untrimmed demux-paired-end.qza and this worked perfectly retaining both of my amplicons from each primer set.

So this is really weird. DADA2 seems to remove my sequences that were trimmed using cutadapt. This is also the case using the cutadapt program outside of q2 and then using them in DADA2. I’ve just found that it works if I set the --p-trunc-len-f and --p-trunc-len-r to 0. But this is not due to non-overlap with the truncated sequences as they overlap fine if I do not use cutadapt first. I’m stuck…

Ah! I have just read the docs for -p-trunc-len-f and realised that it removes any sequences under the value (285 in this case). I thought for some reason it would truncate to that length and keep shorter reads. I had already cut the primer off so effectively made my trimmed sequences (~280) shorter than the cut-off! Having my longer untrimmed reads there from my other amplicon fooled me into thinking I had 300 nt to play with from looking at the visualisation. Sorry about all this. I’m glad I got it sorted as I was going kind of crazy. On the positive side though I have got very familiar with running DADA2! I will have to think carefully about how to treat data with two amplicons. If the discard untrimmed option could be added to the cutadapt then I could view just the data for each amplicon. For now I will run cutadapt outside of q2, then treat each amplicon separately based on its quality scores and length.
Many thanks again

1 Like

Glad you worked this out! That was my thinking re: the dada2 stats — this would have told us if reads were dropped at filtering (e.g., because they are shorter than truncation length, as you figured out) or if they are just too short to overlap and fail at the merging stage.

:smile:

I agree, it is best to process these separately for several reasons (e.g., amplicons like ITS are variable in length and have read-through issues; others like 16S do not). Using cutadapt outside of QIIME 2 may be the best solution for now (e.g., if you use the same barcodes for each amplicon so cannot use demux to separate these), and we hope to get that parameter added in a future release.

2 Likes