Demultiplexing Issue with Cutadapt

dwh2102 · June 12, 2018, 7:24pm

Hello,

I am using Cutadapt 1.16 with QIIME 2 version 2018.4.

I am demultiplexing a data set with 6 different amplicons barcoded from 47 different templates. I am expecting ~2000 reads per barcode/primer combination. I am using the barcode and primer sequence together as the barcode for demultiplexing which works well except that the first sample analyzed–and I’ve re-ordered them and it’s always whichever one is listed first in the metadata file–winds up with a huge number of reads, way more than expected. All the extraneous ones seem be reads with 3 bases removed. Any help would be greatly appreciated.

Thanks!

(qiime2-2018.4) qiime2@qiime2core2018-4:~$ qiime cutadapt demux-single --i-seqs multiplexed-seqs.qza --m-barcodes-file metadata2.tsv --m-barcodes-column BarcodeSequence --p-error-rate 0 --o-per-sample-sequences demultiplexed-seqs.qza --o-untrimmed-sequences untrimmed.qza --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: cutadapt --front file:/tmp/tmpt85v7l8r --error-rate 0.0 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-i_po3nqz/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedSingleEndBarcodeInSequenceDirFmt-5qgb1s72/forward.fastq.gz /tmp/qiime2-archive-ip70iw2x/72148ae6-d0f2-49f0-81ec-4fa1bd2eee5e/data/forward.fastq.gz

This is cutadapt 1.16 with Python 3.5.5
Command line parameters: --front file:/tmp/tmpt85v7l8r --error-rate 0.0 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-i_po3nqz/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedSingleEndBarcodeInSequenceDirFmt-5qgb1s72/forward.fastq.gz /tmp/qiime2-archive-ip70iw2x/72148ae6-d0f2-49f0-81ec-4fa1bd2eee5e/data/forward.fastq.gz
Running on 1 core
Trimming 47 adapters with at most 0.0% errors in single-end mode …
Finished in 786.25 s (283 us/read; 0.21 M reads/minute).

=== Summary ===

Total reads processed: 2,778,271
Reads with adapters: 236,990 (8.5%)
Reads written (passing filters): 2,778,271 (100.0%)

Total basepairs processed: 581,744,202 bp
Total written (filtered): 578,255,648 bp (99.4%)

=== Adapter ATCC ===

Sequence: CTAAGGTAACGATGGCGGACGGGTGAGTAA; Type: regular 5’; Length: 30; Trimmed: 136588 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length count expect max.err error counts
3 135131 43410.5 0 135131
4 133 10852.6 0 133
5 4 2713.2 0 4
16 4 0.0 0 4
17 13 0.0 0 13
18 9 0.0 0 9
19 2 0.0 0 2
28 1 0.0 0 1
29 1 0.0 0 1
30 1289 0.0 0 1289
31 1 0.0 0 1

=== Adapter 14 ===

Sequence: AAGAGGATTCGATGGCGGACGGGTGAGTAA; Type: regular 5’; Length: 30; Trimmed: 3719 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length count expect max.err error counts
29 26 0.0 0 26
30 3685 0.0 0 3685
31 7 0.0 0 7
48 1 0.0 0 1

=== Adapter 18 ===

Sequence: TACCAAGATCGATGGCGGACGGGTGAGTAA; Type: regular 5’; Length: 30; Trimmed: 1539 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length count expect max.err error counts
26 1 0.0 0 1
28 1 0.0 0 1
29 4 0.0 0 4
30 1530 0.0 0 1530
31 3 0.0 0 3

=== Adapter 21 ===

Sequence: CAGAAGGAACGATGGCGGACGGGTGAGTAA; Type: regular 5’; Length: 30; Trimmed: 1437 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length count expect max.err error counts
29 7 0.0 0 7
30 1429 0.0 0 1429
31 1 0.0 0 1

thermokarst · June 12, 2018, 8:14pm

Hey there @dwh2102! That certainly seems strange - thanks for bringing this up! Would you be able to share your data (you could send to me in a private message if you don’t want to share publicly). It would help with recreating the situation. Thanks!

dwh2102 · June 12, 2018, 9:11pm

Sure thing. Link is to the multiplexed reads, metadata file attached.

https://drive.google.com/open?id=1l0gmf5AFAGW8kAXNR7i2-3jBLZmKUwsI

metadata2.tsv (3.1 KB)

thermokarst · June 12, 2018, 9:32pm

I think we figured it out!

Cutadapt supports a few different searching and anchoring strategies. Check out this graphic, pulled from the cutadapt docs:

02%20PM

Since you aren't including a ^ before the barcode sequence in your metadata column, cutadapt is using the unanchored strategy, which is why your first sample is picking up sooooooo many more reads - essentially every partial match is pooling into that sample (whichever is listed first, since cutadapt will check there first).

If I prepend a ^ to all of your barcodes like this:

53%20PM

here are the results:

=== Summary ===

Total reads processed:               2,778,271
Reads with adapters:                   100,926 (3.6%)
Reads written (passing filters):     2,778,271 (100.0%)

Total basepairs processed:   581,744,202 bp
Total written (filtered):    578,685,723 bp (99.5%)

=== Adapter ATCC ===

Sequence: CTAAGGTAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1289 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1289    0.0     0       1289

=== Adapter 14 ===

Sequence: AAGAGGATTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 3685 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      3685    0.0     0       3685

=== Adapter 18 ===

Sequence: TACCAAGATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1530 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1530    0.0     0       1530

=== Adapter 21 ===

Sequence: CAGAAGGAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1429 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1429    0.0     0       1429

=== Adapter 23 ===

Sequence: CTGCAAGTTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 3069 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      3069    0.0     0       3069

=== Adapter 25 ===

Sequence: TTCGTGATTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2674 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2674    0.0     0       2674

=== Adapter 26 ===

Sequence: TTCCGATAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1148 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1148    0.0     0       1148

=== Adapter 27 ===

Sequence: TGAGCGGAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2456 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2456    0.0     0       2456

=== Adapter 29 ===

Sequence: CTGACCGAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2522 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2522    0.0     0       2522

=== Adapter 30 ===

Sequence: TCCTCGAATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 3864 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      3864    0.0     0       3864

=== Adapter 31 ===

Sequence: TAGGTGGTTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 4081 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      4081    0.0     0       4081

=== Adapter 33 ===

Sequence: TCTAACGGACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2989 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2989    0.0     0       2989

=== Adapter 34 ===

Sequence: TTGGAGTGTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2436 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2436    0.0     0       2436

=== Adapter 35 ===

Sequence: TCTAGAGGTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2833 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2833    0.0     0       2833

=== Adapter 36 ===

Sequence: TCTGGATGACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1708 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1708    0.0     0       1708

=== Adapter 37 ===

Sequence: TCTATTCGTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 3236 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      3236    0.0     0       3236

=== Adapter 38 ===

Sequence: AGGCAATTGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2508 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2508    0.0     0       2508

=== Adapter 39 ===

Sequence: TTAGTCGGACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2065 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2065    0.0     0       2065

=== Adapter 40 ===

Sequence: CAGATCCATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2242 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2242    0.0     0       2242

=== Adapter 41 ===

Sequence: TCGCAATTACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 3141 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      3141    0.0     0       3141

=== Adapter 42 ===

Sequence: TTCGAGACGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 4468 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      4468    0.0     0       4468

=== Adapter 43 ===

Sequence: TGCCACGAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 6390 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      6390    0.0     0       6390

=== Adapter 44 ===

Sequence: AACCTCATTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2648 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2648    0.0     0       2648

=== Adapter 46 ===

Sequence: CCTGAGATACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1798 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1798    0.0     0       1798

=== Adapter 47 ===

Sequence: TTACAACCTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1731 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1731    0.0     0       1731

=== Adapter 48 ===

Sequence: AACCATCCGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1728 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1728    0.0     0       1728

=== Adapter 49 ===

Sequence: ATCCGGAATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2374 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2374    0.0     0       2374

=== Adapter 50 ===

Sequence: TCGACCACTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1351 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1351    0.0     0       1351

=== Adapter 52 ===

Sequence: CGAGGTTATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1640 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1640    0.0     0       1640

=== Adapter 53 ===

Sequence: TCCAAGCTGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 2167 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      2167    0.0     0       2167

=== Adapter 55 ===

Sequence: TCTTACACACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1315 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1315    0.0     0       1315

=== Adapter 56 ===

Sequence: TTCTCATTGAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 934 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      934     0.0     0       934

=== Adapter 57 ===

Sequence: TCGCATCGTTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 2357 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      2357    0.0     0       2357

=== Adapter 58 ===

Sequence: TAAGCCATTGTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 1665 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      1665    0.0     0       1665

=== Adapter 59 ===

Sequence: AAGGAATCGTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1671 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1671    0.0     0       1671

=== Adapter 60 ===

Sequence: CTTGAGAATGTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 1761 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      1761    0.0     0       1761

=== Adapter 61 ===

Sequence: TGGAGGACGGACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 1061 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      1061    0.0     0       1061

=== Adapter 62 ===

Sequence: TAACAATCGGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1092 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1092    0.0     0       1092

=== Adapter 63 ===

Sequence: CTGACATAATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1071 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1071    0.0     0       1071

=== Adapter 64 ===

Sequence: TTCCACTTCGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1189 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1189    0.0     0       1189

=== Adapter 66 ===

Sequence: AGCACGAATCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 30; Trimmed: 1174 times.

No. of allowed errors:
0-30 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
30      1174    0.0     0       1174

=== Adapter 67 ===

Sequence: CTTGACACCGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1248 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1248    0.0     0       1248

=== Adapter 69 ===

Sequence: TTGGAGGCCAGCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 1167 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      1167    0.0     0       1167

=== Adapter 70 ===

Sequence: TGGAGCTTCCTCGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 1504 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      1504    0.0     0       1504

=== Adapter 72 ===

Sequence: TCAGTCCGAACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1295 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1295    0.0     0       1295

=== Adapter 73 ===

Sequence: TAAGGCAACCACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 32; Trimmed: 1370 times.

No. of allowed errors:
0-32 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
32      1370    0.0     0       1370

=== Adapter 74 ===

Sequence: TTCTAAGAGACGATGGCGGACGGGTGAGTAA; Type: anchored 5'; Length: 31; Trimmed: 1852 times.

No. of allowed errors:
0-31 bp: 0

Overview of removed sequences
length  count   expect  max.err error counts
31      1852    0.0     0       1852

Awesome! Give that a whirl and let us know how it goes!

dwh2102 · June 13, 2018, 1:54pm

Tried anchoring the barcodes and you are correct it worked perfectly! Thanks so much for your help and quick reply.

system · July 14, 2018, 8:01pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.