does q2-cutadapt support dual indexed reads?

Command: cutadapt --front file:/var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/tmpaq98eu4z --error-rate 0.1 --minimum-length 1 -o /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-6r3jfxn0/{name}.1.fastq.gz --untrimmed-output /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-omma8zf8/forward.fastq.gz --pair-adapters -G file:/var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/tmpz0ucd4n5 -p /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-6r3jfxn0/{name}.2.fastq.gz --untrimmed-paired-output /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-omma8zf8/reverse.fastq.gz /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/qiime2-archive-rrj3e5zp/df1aaf33-eaf9-4944-bb88-aff020a1c541/data/forward.fastq.gz /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/qiime2-archive-rrj3e5zp/df1aaf33-eaf9-4944-bb88-aff020a1c541/data/reverse.fastq.gz

This is cutadapt 2.4 with Python 3.6.7

Command line parameters: --front file:/var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/tmpaq98eu4z --error-rate 0.1 --minimum-length 1 -o /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-6r3jfxn0/{name}.1.fastq.gz --untrimmed-output /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-omma8zf8/forward.fastq.gz --pair-adapters -G file:/var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/tmpz0ucd4n5 -p /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-CasavaOneEightSingleLanePerSampleDirFmt-6r3jfxn0/{name}.2.fastq.gz --untrimmed-paired-output /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-omma8zf8/reverse.fastq.gz /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/qiime2-archive-rrj3e5zp/df1aaf33-eaf9-4944-bb88-aff020a1c541/data/forward.fastq.gz /var/folders/z7/2djb6s4j1hjdj1vvp9p4bm940000gn/T/qiime2-archive-rrj3e5zp/df1aaf33-eaf9-4944-bb88-aff020a1c541/data/reverse.fastq.gz

Processing reads on 1 core in paired-end mode ...

[ 8<--] 00:01:45 1,214,093 reads @ 87.0 µs/read; 0.69 M reads/minute

Finished in 105.66 s (87 us/read; 0.69 M reads/minute).

=== Summary ===

Total read pairs processed: 1,214,093

Read 1 with adapter: 1,175 (0.1%)

Read 2 with adapter: 1,175 (0.1%)

Pairs that were too short: 1 (0.0%)

Pairs written (passing filters): 2,428,184 (200.0%)

Total basepairs processed: 609,474,686 bp

Read 1: 304,737,343 bp

Read 2: 304,737,343 bp

Total written (filtered): 609,457,317 bp (100.0%)

Read 1: 304,726,734 bp

Read 2: 304,730,583 bp

=== First read: Adapter D3.31.15 ===

Sequence: CCTATCCT; Type: regular 5'; Length: 8; Trimmed: 453 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 263 18970.2 0 263

4 72 4742.6 0 72

5 50 1185.6 0 50

6 1 296.4 0 1

8 1 18.5 0 1

10 2 18.5 0 2

12 34 18.5 0 34

19 1 18.5 0 1

23 1 18.5 0 1

31 1 18.5 0 1

32 1 18.5 0 1

36 1 18.5 0 1

39 1 18.5 0 1

46 1 18.5 0 1

50 1 18.5 0 1

53 1 18.5 0 1

57 1 18.5 0 1

60 1 18.5 0 1

62 1 18.5 0 1

64 1 18.5 0 1

77 2 18.5 0 2

84 1 18.5 0 1

96 1 18.5 0 1

98 1 18.5 0 1

108 1 18.5 0 1

112 1 18.5 0 1

120 1 18.5 0 1

131 1 18.5 0 1

138 1 18.5 0 1

158 1 18.5 0 1

174 1 18.5 0 1

191 1 18.5 0 1

217 1 18.5 0 1

234 1 18.5 0 1

250 1 18.5 0 1

251 1 18.5 0 1

=== First read: Adapter D5.5.15 ===

Sequence: CCTATCCT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D5.27.15 ===

Sequence: CCTATCCT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D6.2.15 ===

Sequence: CCTATCCT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D7.9.15 ===

Sequence: AGGCGAAG; Type: regular 5'; Length: 8; Trimmed: 205 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 65 18970.2 0 65

4 13 4742.6 0 13

5 5 1185.6 0 5

12 122 18.5 0 122

=== First read: Adapter D7.24.15 ===

Sequence: AGGCGAAG; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D8.7.15 ===

Sequence: AGGCGAAG; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D8.21.15 ===

Sequence: TAATCTTA; Type: regular 5'; Length: 8; Trimmed: 195 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 90 18970.2 0 90

4 21 4742.6 0 21

5 7 1185.6 0 7

6 4 296.4 0 4

12 73 18.5 0 73

=== First read: Adapter D9.11.15 ===

Sequence: TAATCTTA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D9.25.15 ===

Sequence: TAATCTTA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D10.9.15 ===

Sequence: TAATCTTA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D10.24.15 ===

Sequence: TAATCTTA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D11.6.15 ===

Sequence: CAGGACGT; Type: regular 5'; Length: 8; Trimmed: 153 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 69 18970.2 0 69

4 12 4742.6 0 12

5 4 1185.6 0 4

12 67 18.5 0 67

124 1 18.5 0 1

=== First read: Adapter D11.20.15 ===

Sequence: CAGGACGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D12.7.15 ===

Sequence: CAGGACGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D12.21.15 ===

Sequence: CAGGACGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D1.12.16 ===

Sequence: CAGGACGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D1.26.16 ===

Sequence: GTACTGAC; Type: regular 5'; Length: 8; Trimmed: 169 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 71 18970.2 0 71

4 16 4742.6 0 16

5 4 1185.6 0 4

6 1 296.4 0 1

7 1 74.1 0 1

12 72 18.5 0 72

35 1 18.5 0 1

36 1 18.5 0 1

122 1 18.5 0 1

157 1 18.5 0 1

=== First read: Adapter D2.26.16 ===

Sequence: GTACTGAC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D3.9.16 ===

Sequence: GTACTGAC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D3.24.16 ===

Sequence: GTACTGAC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== First read: Adapter D4.8.16 ===

Sequence: GTACTGAC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D3.31.15 ===

Sequence: GAATTCGT; Type: regular 5'; Length: 8; Trimmed: 453 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 410 18970.2 0 410

4 31 4742.6 0 31

5 3 1185.6 0 3

6 3 296.4 0 3

7 1 74.1 0 1

10 1 18.5 0 1

12 3 18.5 0 3

22 1 18.5 0 1

=== Second read: Adapter D5.5.15 ===

Sequence: GAGATTCC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D5.27.15 ===

Sequence: ATTCAGAA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D6.2.15 ===

Sequence: CGCTCATT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D7.9.15 ===

Sequence: GAGATTCC; Type: regular 5'; Length: 8; Trimmed: 205 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 104 18970.2 0 104

4 96 4742.6 0 96

5 2 1185.6 0 2

8 1 18.5 0 1

22 1 18.5 0 1

194 1 18.5 0 1

=== Second read: Adapter D7.24.15 ===

Sequence: ATTCAGAA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D8.7.15 ===

Sequence: GAATTCGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D8.21.15 ===

Sequence: CGCTCATT; Type: regular 5'; Length: 8; Trimmed: 195 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 133 18970.2 0 133

4 38 4742.6 0 38

5 10 1185.6 0 10

6 2 296.4 0 2

16 1 18.5 0 1

18 1 18.5 0 1

23 1 18.5 0 1

42 1 18.5 0 1

61 1 18.5 0 1

92 1 18.5 0 1

102 1 18.5 0 1

106 3 18.5 0 3

125 1 18.5 0 1

194 1 18.5 0 1

=== Second read: Adapter D9.11.15 ===

Sequence: GAGATTCC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D9.25.15 ===

Sequence: ATTCAGAA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D10.9.15 ===

Sequence: GAATTCGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D10.24.15 ===

Sequence: CTGAAGCT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D11.6.15 ===

Sequence: CGCTCATT; Type: regular 5'; Length: 8; Trimmed: 153 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 113 18970.2 0 113

4 21 4742.6 0 21

5 10 1185.6 0 10

6 1 296.4 0 1

7 3 74.1 0 3

49 1 18.5 0 1

63 1 18.5 0 1

80 1 18.5 0 1

84 1 18.5 0 1

106 1 18.5 0 1

=== Second read: Adapter D11.20.15 ===

Sequence: GAGATTCC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D12.7.15 ===

Sequence: ATTCAGAA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D12.21.15 ===

Sequence: GAATTCGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D1.12.16 ===

Sequence: CTGAAGCT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D1.26.16 ===

Sequence: CGCTCATT; Type: regular 5'; Length: 8; Trimmed: 169 times.

No. of allowed errors:

0-8 bp: 0

Overview of removed sequences

length count expect max.err error counts

3 128 18970.2 0 128

4 24 4742.6 0 24

5 7 1185.6 0 7

6 1 296.4 0 1

26 1 18.5 0 1

64 1 18.5 0 1

69 1 18.5 0 1

101 1 18.5 0 1

106 1 18.5 0 1

125 1 18.5 0 1

185 1 18.5 0 1

202 1 18.5 0 1

235 1 18.5 0 1

=== Second read: Adapter D2.26.16 ===

Sequence: GAGATTCC; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D3.9.16 ===

Sequence: ATTCAGAA; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D3.24.16 ===

Sequence: GAATTCGT; Type: regular 5'; Length: 8; Trimmed: 0 times.

=== Second read: Adapter D4.8.16 ===

Sequence: CTGAAGCT; Type: regular 5'; Length: 8; Trimmed: 0 times.

Saved SampleData[PairedEndSequencesWithQuality] to: cutadapt-demux.qza

Saved MultiplexedPairedEndBarcodeInSequence to: unmatched-barcodes.qza
2 Likes

Update: I tried with reverse complemented R barcodes, and got the same result.

1 Like

Are these Nextera Illumina Primers? I had this same question, I tried this same thing and did a cut adapt on adapters and primers in the forward/reverse/reverse complement and then did a search and actually got similar results (MINIMAL ADAPTER/PRIMER FOUND IN ANY POSITION). I think if these are Illumina runs, which mine were, the sequencing happens after the adapter position and they are not found in the sequences.

Did you do DADA2 without cut adapt? I would suggest doing that and THEN looking at the rep-seqs. Click on some and see if the adaptor sequences are found (which will not match with the 16S sequences).

Ben

@LuSanto, thanks for sharing your log file! Well, it looks pretty straightforward from here, only 0.1% of the reads are being matched with your barcodes. Scrolling through, it actually looks to me like something is wrong with your forward barcodes. Have you tried running the reverse complement of them?

2 Likes

Hi. I double-checked the barcodes and they are correct. The reason why it seems that only a small proportion of reads are matched is because there are reads from a different project in the same fastq files. In fact, if I run DADA2 on the samples that were demultiplexed (only 5 out of 22), they look absolutely normal (about 40K reads per sample as expected). Anyways, I did try running the reverse complement of either the F or R barcodes, and I always get the same result: the same 5 samples are always demultiplexed, while the remaining 17 are not.

For some reason, cutadapt is only recognizing the F barcode the first time it appears on the metadata file. The 5 samples being demultiplexed are shown in yellow on the attached screenshot.

This is confirmed when I re-shufle sample order in the metadata file. Again, only samples in yellow are demultiplexed (correct screenshot below).

Hi @LuSanto - you are now describing a separate, and what I believe to be, unrelated, issue. The primary issue in this post is that your forward barcodes are not identifiable in your reads. The secondary issue is the one you just posted, about only the first sample in a group of matching forward barcodes being demultiplexed. I will take a closer look at the secondary issue, but the primary issue can only be resolved by you and/or your sequencing center — you need to make sure that the orientation of reads match that of the barcodes. The fact that you are getting such low recovery is a smoking-gun that there is an issue here.

Hi @thermokarst, thanks again for your answer, and sorry this is becoming so long.

  1. The issue of only the first sample in a group of F barcodes being demultiplexed was mentioned since Oct 1 and, along with the R barcode being ignored by cutadapt, ARE the main problems to me.
  2. As mentioned, I did verify that the F and R barcode sequences and orientations are correct. Something I had not mentioned is that the same raw reads were successfully demultiplexed with the exact same barcode sequences in QIIME 1 some time ago (obviously getting an .fna, rather than the .qza now needed for DADA2). This proves that the problem are not the barcodes themselves.
  3. I’m not sure I understand the log file, but the “0.1% reads matched” is strange. As mentioned, the 5 samples that were demultiplexed actually contain many more reads than 1,175 (about 40,000+ reads per sample). See here stats_dada2.tsv (305 Bytes).

I run out of options on my side. Would be great if this could be solved. Thanks!

1 Like

Hi @LuSanto!

No worries - that is what we are here for!

I'm sorry, I could've made my post above more clear. A better way to put it is that, yes, the "only matching on the first sample" issue is a problem, we should probably sort out the bigger issue of your barcodes not matching really any of the samples. Once we sort that out, I suspect that the "only matching on the first sample" problem will be reconciled.

That is good info to have! And I agree, that does seem to imply that these barcodes should work, but I disagree that that it "proves that the problem are not the barcodes themselves" - it is possible that QIIME 1 performed RCing for you (perhaps by default?). I'm not a QIIME 1 dev, so I can't say for sure. Either way, it doesn't change what I said before --- we need to make sure that we are able to get your reads and barcodes in the same orientation.

Any chance you didn't provide the entire log file? Maybe you only copied and pasted part of it?

Please double check the read orientation and the barcode orientation. I hope I have demonstrated above that, while this might've worked in QIIME 1, it doesn't mean that it will necessarily work without adjustment in q2-cutadapt.

Thanks!

2 Likes

Quick update. I could not solve the issue of only the first sample in a group of matching forward barcodes being demultiplexed. The only way around was to split the metadata into several files, so each of them only includes a unique F barcode. Then I used cutadapt separately with each metadata file. Finally, I reimported all the .fasq files into a single .qza for downstream analysis. The separate read numbers add up to the total, so it seems that I finally have correctly demultiplexed samples (which also means that the barcodes were in the correct orientation; the confounding factor was that I picked a wrong Log file).

2 Likes

Hi @LuSanto - thanks for following up!

Makes sense!

I did a bit of digging with @ebolyen on this today --- turns out cutadapt 2.3 (the version of cutadapt used by q2-cutadapt 2019.7) doesn't support CDI strategies. Newer versions of cutadapt support CDI strategies - we are hoping to support that in q2-cutadapt in the future. Thanks!

1 Like

Just to clarify (for anyone who stumbles on this thread), q2-cutadapt 2019.10 / cutadapt 2.3 do support UDI strategies (where the forward and reverse barcodes are all unique).

1 Like

Hi @LuSanto I am trying to solve the exact same problem (Demultiplexing CDI reads) and I would like to know more about your workaround method.

The only way around was to split the metadata into several files, so each of them only includes a unique F barcode. Then I used cutadapt separately with each metadata file. Finally, I reimported all the .fasq files into a single .qza for downstream analysis.

Did you run the demux-paired command on the .qza file for each metatada file with the unique F barcode?
If so how did you end up with several fasq files? I thought the output of demux-paired is also a .qza

Thank you!

Hi @MG_709,
I run demux on the same .qza input several times, each time with a new metadata file (each containing unique F barcodes). Then I unzipped each .qza output (by manually changing the extension to .zip) and retrieved the demultiplexed .fastq files. Finally I imported all the .fastq files into a new .qza, using a manifest.txt.

Hope this helps!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.