Error Rate seems high in cutadapt plugin demultiplexing

Hi All,

I have a use case like the one described here where both my R1 and R2 files contain a mixture of forward and reverse reads. And my sequencing files look like this:

@M01533:2:000000000-A303Y:1:1101:16180:1467 1:N:0:1
ANTTGCCGGTGTGCCAGCCGCCGCGGTAATACGTATGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGAGTGGCAAGTCTGATGTGAAAACCCGGGGCTCAACCCCGGGACTGCATTGGAAACTGTCAATCTAGGGTACCGGGGGGGTAAGCGGAATTCCTAGGGAAGCGGGGAAATGGGTGGTTTTTAGGGGGAAACCCGTGGGGGAAGGCGGCTTACTGNCGGGTAACGGCCGTTGGGGCCG
+
A#>AABBBBBBBGGGGGGGGGGGGGGGGGHHHHHGHHFHHHHGHHGGGGHHHGGGGGHHHHHGHHEEHHHHHGHGGGGGGGHGG/EEGEGHGGHGHHFHHHHHHGHHGGDGGGGFHHHHGGGGG?CFHHFGGHHGHCGHG::GB0;0CC0.//;FDG---;--;///;-B9EBF/B///:/.;.;99=AA./:/./.:...;9.;/.;-9A./;D-:.9----.;D.;--;-//9;#..;..;9/.---:;.:[email protected]

I have two paired end Illumina sequencing runs that I imported individually into QIIME2 as follows:

qiime tools import --type MultiplexedPairedEndBarcodeInSequence
--input-path ~/path/Fastq_set1
--output-path ~/path/Fastq_set1/multiplexed-seqs-set1.qza

qiime tools import --type MultiplexedPairedEndBarcodeInSequence
--input-path ~/path/Fastq_set2
--output-path ~/path/Fastq_set2/multiplexed-seqs-set2.qza

...and demultiplexed individually as follows:

qiime cutadapt demux-paired
--i-seqs ~/path/Fastq_set1/multiplexed-seqs-set1.qza
--m-forward-barcodes-file ~/path/Mapping.csv
--m-forward-barcodes-column BarcodeSequence
--p-error-rate 0
--o-per-sample-sequences ~/path/Demux/demux.qza
--o-untrimmed-sequences ~/path/Demux/unmatched.qza
--verbose

qiime cutadapt demux-paired
--i-seqs ~/path/Fastq_set2/multiplexed-seqs-set2.qza
--m-forward-barcodes-file ~/path/Mapping_File.csv
--m-forward-barcodes-column BarcodeSequence
--p-error-rate 0
--o-per-sample-sequences ~/path/Demux/demux2.qza
--o-untrimmed-sequences ~/path/Demux/unmatched2.qza
--verbose

I used one mapping file for both runs, although I know that only half of the samples are in each run. However, a significant number of sequences for all of the samples were identified in both runs. I have attached the demux.qzv files for each set. I believe this may indicate a terribly high error rate in the cutadapt demultiplexing function and I'm wondering if anyone else has come across an issue like this or if it is possible that I'm missing something obvious in my application of the plugin. Thanks in advance for any help that you can give.

-Mary

demux2.qzv (295.1 KB)
demux.qzv (295.7 KB)

What does your BarcodeSequence column look like? I wonder if you should use the anchoring syntax for cutadapt to prevent any inadvertent matches...

I think in order to diagnose this we will either need the data necessary to re-run this locally, or, maybe we can make do with the full log when run with --verbose --- this will tell us what cutadapt is doing with a bit more detail. Feel free to DM data my way if you want to go that route. Thanks! :qiime2: :t_rex:

Thank you @thermokarst! I’m sorry for the delay in my response. The anchoring syntax helped a lot and I’m now investigating whether some remaining “error” could be due to some confusion with the preparation of the mapping file as I myself was not the one to submit the samples for sequencing.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.