Duplicate barcode error, but barcodes listed are not the same.

I am running qiime2 version 2024.5-amplicon that was loaded with conda.

When running the command…

qiime cutadapt demux-paired
--i-seqs multiplexed-seqs.qza
--m-forward-barcodes-file Osburn10BSS8v3.All.tsv
--m-forward-barcodes-column barcode
--p-anchor-forward-barcode TRUE
--p-mixed-orientation TRUE
--o-per-sample-sequences demux.qza
--o-untrimmed-sequences untrimmed.qza
--verbose

I get the error message…

Traceback (most recent call last):
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2cli/commands.py", line 520, in \__call_\_
    results = self.\_execute_action(
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2cli/commands.py", line 581, in \_execute_action
    results = action(\*\*arguments)
  File "<decorator-gen-43>", line 2, in demux_paired
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self.\_callable_executor\_(
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in \_callable_executor\_
    output_views = self.\_callable(\*\*view_args)
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2_cutadapt/\_demux.py", line 317, in demux_paired
    \_check_barcodes_uniqueness(
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2_cutadapt/\_demux.py", line 162, in \_check_barcodes_uniqueness
    raise ValueError('The following samples have duplicate barcode: %s'

ValueError: The following samples have duplicate barcode: EP.LC.03.24.1, PCR.NEG

Plugin error from cutadapt:

  The following samples have duplicate barcode: EP.LC.03.24.1, PCR.NEG

See above for debug info.

However, the barcodes listed for samples EP.LC.03.24.1 and PCR.NEG are not the same in my metadata file (Osburn10BSAll.fxd.tsv). The barcode for EP.LC.03.24.1 is TCACCTCCTTGT, and PCR.NEG is TGTAACGCCGAT.

Oddly, I have previously run this step using an earlier version of qiime2, 2022.2 also loaded via conda, with no errors. I have attached the metadata file below.

Thanks for any insight!

Osburn10BSS8v3.All.tsv (10.1 KB)

Hello Bradley,

Welcome to the forums! :qiime2:

Thank you for posting your version, command, error message, and metadata file!

I check the metadata file you posted, and indeed I see different barcodes for these two samples. Very strange!

However, I found some other things too. These two samples have the same barcode:

  • PCR.NEG Mia.Tuccillo TGTAACGCCGAT
  • MT.1E Mia.Tuccillo TGTAACGCCGAT BC1 H8 92

The metadata file you mention is named two different things:

  • Osburn10BSAll.fxd.tsv
  • Osburn10BSS8v3.All.tsv

No judgment! I have to clean up metadata on literally every project that I'm on, even when I make it myself! (Especially when I make it myself!)

I suspect an older metadata file is being used, or file names are getting swapped somewhere. Maybe the v3 file actually has v2 barcodes?

Jokes aside, to track down problems like this I often run commands in a Jupyter Notebook or add them into a bash.sh script. This way I have a record of exactly what was run and which files I used.

Let us know what you find!

1 Like

Oy, as you can imagine, I have run this dataset every which way trying to figure out what is going on. Let me start from scratch (after checking the one set of duplicate barcodes you actually found) and log every step in a bash.sh script. I will let you know what I find.

Thanks Colin!

Ok, this just got weirder…

There was indeed duplicate barcodes (the ones you pointed out). That was my error and I fixed it. I then ran the commands listed in the bash.sh script q2-Osburn10BSS8.All-cutadapt-demux.sh listed below.

#!/bin/bash
#SBATCH -A p31523                                     
#SBATCH -p normal                                  
#SBATCH -t 48:00:00
#SBATCH --mem=48G                                     
#SBATCH -n 4
#SBATCH -N 1
#SBATCH --mail-user=bradley.stevenson@northwestern.edu # change to your email
#SBATCH --mail-type=END                              
#SBATCH --job-name="import_demux_cutadapt"
#SBATCH --output=%j-%x.out     

module purge all
module load python-miniconda3
source activate /software/qiime2/2024.5-amplicon/amplicon-env/

cd /projects/p31523/Osburn10BSAll/ # change to your data directory
OUT_DR=`pwd`/qiime2-8.15.25-out
mkdir -p $OUT_DR

echo "[`date`] Importing data into qiime2 ..."

qiime --version

# import seqs as qza
qiime tools import \
 --type MultiplexedPairedEndBarcodeInSequence \
 --input-path muxed-pe-barcode-in-seq \
 --output-path ${OUT_DR}/multiplexed-seqs.qza

echo "[`date`] Demultiplexing paired-end reads ..."

# demultiplex
qiime cutadapt demux-paired \
  --i-seqs ${OUT_DR}/multiplexed-seqs.qza \
  --m-forward-barcodes-file Osburn10BSS8.All.tsv \
  --m-forward-barcodes-column barcode \
  --p-anchor-forward-barcode TRUE\
  --p-anchor-reverse-barcode TRUE \
  --p-mixed-orientation TRUE\
  --o-per-sample-sequences demux.qza \
  --o-untrimmed-sequences untrimmed.qza \
  --verbose

qiime demux summarize \
  --i-data ${OUT_DR}/demux.qza \
  --o-visualization ${OUT_DR}/demux.qzv

echo "[`date`] Trimming primer sequences from demultiplexed paired-end reads ..."

# trim forward and reverse primers (515FY-M13/926R Parada primers)
qiime cutadapt trim-paired \
  --i-demultiplexed-sequences ${OUT_DR}/demux.qza \
  --p-anywhere-f CCGTAAAACGACGGCCAGCCGTGYCAGCMGCCGCGGTAA \
  --p-anywhere-r CCGYCAATTYMTTTRAGTTT \
  --p-match-read-wildcards \
  --p-cores $SLURM_NTASKS \
  --o-untrimmed-sequences ${OUT_DR}/demux-untrimmed.qza \
  --o-trimmed-sequences ${OUT_DR}/demux-trimmed-no-untrimmed.qza

qiime demux summarize \
  --i-data ${OUT_DR}/demux-trimmed-no-untrimmed.qza \
  --o-visualization ${OUT_DR}/demux-trimmed-no-untrimmed.qzv

I got the same error, that there were duplicate barcodes but it only lists a single library (EP.LC.03.24.1) this time. Here is the error log…

[Fri Aug 15 14:28:52 CDT 2025] Importing data into qiime2 ...
q2cli version 2024.5.0
Run `qiime info` for more version details.
Imported muxed-pe-barcode-in-seq as MultiplexedPairedEndBarcodeInSequenceDirFmt to /projects/p31523/Osburn10BSAll/qiime2-8.15.25-out/multiplexed-seqs.qza
[Fri Aug 15 14:31:19 CDT 2025] Demultiplexing paired-end reads ...
Traceback (most recent call last):
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-43>", line 2, in demux_paired
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2_cutadapt/_demux.py", line 317, in demux_paired
    _check_barcodes_uniqueness(
  File "/software/qiime2/2024.5-amplicon/amplicon-env/lib/python3.9/site-packages/q2_cutadapt/_demux.py", line 162, in _check_barcodes_uniqueness
    raise ValueError('The following samples have duplicate barcode: %s'
ValueError: The following samples have duplicate barcode: EP.LC.03.24.1

Plugin error from cutadapt:

  The following samples have duplicate barcode: EP.LC.03.24.1

See above for debug info.
Usage: qiime demux summarize [OPTIONS]

  Summarize counts per sample for all samples, and generate interactive
  positional quality plots based on `n` randomly selected sequences.

Inputs:
  --i-data ARTIFACT SampleData[SequencesWithQuality |
    PairedEndSequencesWithQuality | JoinedSequencesWithQuality]
                       The demultiplexed sequences to be summarized.
                                                                    [required]
Parameters:
  --p-n INTEGER        The number of sequences that should be selected at
                       random for quality score plots. The quality plots will
                       present the average positional qualities across all of
                       the sequences selected. If input sequences are paired
                       end, plots will be generated for both forward and
                       reverse reads for the same `n` sequences.
                                                              [default: 10000]
Outputs:
  --o-visualization VISUALIZATION
                                                                    [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --example-data PATH  Write example data and exit.
  --citations          Show citations and exit.
  --help               Show this message and exit.

Examples:
  # ### example: demux
  qiime demux summarize \
    --i-data demux.qza \
    --o-visualization visualization.qzv
  

                    There was a problem with the command:                     
 (1/1) Invalid value for '--i-data':
  /projects/p31523/Osburn10BSAll/qiime2-8.15.25-out/demux.qza does not exist.
[Fri Aug 15 14:33:28 CDT 2025] Trimming primer sequences from demultiplexed paired-end reads ...
Usage: qiime cutadapt trim-paired [OPTIONS]

  Search demultiplexed paired-end sequences for adapters and remove them. The
  parameter descriptions in this method are adapted from the official cutadapt
  docs - please see those docs at https://cutadapt.readthedocs.io for complete
  details.

Inputs:
  --i-demultiplexed-sequences ARTIFACT 
    SampleData[PairedEndSequencesWithQuality]
                          The paired-end sequences to be trimmed.   [required]
Parameters:
  --p-cores NTHREADS      Number of CPU cores to use.             [default: 1]
  --p-adapter-f TEXT...   Sequence of an adapter ligated to the 3' end. The
    List[Str]             adapter and any subsequent bases are trimmed. If a
                          `$` is appended, the adapter is only found if it is
                          at the end of the read. Search in forward read. If
                          your sequence of interest is "framed" by a 5' and a
                          3' adapter, use this parameter to define a "linked"
                          primer - see https://cutadapt.readthedocs.io for
                          complete details.                         [optional]
  --p-front-f TEXT...     Sequence of an adapter ligated to the 5' end. The
    List[Str]             adapter and any preceding bases are trimmed. Partial
                          matches at the 5' end are allowed. If a `^`
                          character is prepended, the adapter is only found if
                          it is at the beginning of the read. Search in
                          forward read.                             [optional]
  --p-anywhere-f TEXT...  Sequence of an adapter that may be ligated to the
    List[Str]             5' or 3' end. Both types of matches as described
                          under `adapter` and `front` are allowed. If the
                          first base of the read is part of the match, the
                          behavior is as with `front`, otherwise as with
                          `adapter`. This option is mostly for rescuing failed
                          library preparations - do not use if you know which
                          end your adapter was ligated to. Search in forward
                          read.                                     [optional]
  --p-adapter-r TEXT...   Sequence of an adapter ligated to the 3' end. The
    List[Str]             adapter and any subsequent bases are trimmed. If a
                          `$` is appended, the adapter is only found if it is
                          at the end of the read. Search in reverse read. If
                          your sequence of interest is "framed" by a 5' and a
                          3' adapter, use this parameter to define a "linked"
                          primer - see https://cutadapt.readthedocs.io for
                          complete details.                         [optional]
  --p-front-r TEXT...     Sequence of an adapter ligated to the 5' end. The
    List[Str]             adapter and any preceding bases are trimmed. Partial
                          matches at the 5' end are allowed. If a `^`
                          character is prepended, the adapter is only found if
                          it is at the beginning of the read. Search in
                          reverse read.                             [optional]
  --p-anywhere-r TEXT...  Sequence of an adapter that may be ligated to the
    List[Str]             5' or 3' end. Both types of matches as described
                          under `adapter` and `front` are allowed. If the
                          first base of the read is part of the match, the
                          behavior is as with `front`, otherwise as with
                          `adapter`. This option is mostly for rescuing failed
                          library preparations - do not use if you know which
                          end your adapter was ligated to. Search in reverse
                          read.                                     [optional]
  --p-error-rate PROPORTION Range(0, 1, inclusive_end=True)
                          Maximum allowed error rate.           [default: 0.1]
  --p-indels / --p-no-indels
                          Allow insertions or deletions of bases when
                          matching adapters.                   [default: True]
  --p-times INTEGER       Remove multiple occurrences of an adapter if it is
    Range(1, None)        repeated, up to `times` times.          [default: 1]
  --p-overlap INTEGER     Require at least `overlap` bases of overlap between
    Range(1, None)        read and adapter for an adapter to be found.
                                                                  [default: 3]
  --p-match-read-wildcards / --p-no-match-read-wildcards
                          Interpret IUPAC wildcards (e.g., N) in reads.
                                                              [default: False]
  --p-match-adapter-wildcards / --p-no-match-adapter-wildcards
                          Interpret IUPAC wildcards (e.g., N) in adapters.
                                                               [default: True]
  --p-minimum-length INTEGER
    Range(1, None)        Discard reads shorter than specified value. Note,
                          the cutadapt default of 0 has been overridden,
                          because that value produces empty sequence records.
                                                                  [default: 1]
  --p-discard-untrimmed / --p-no-discard-untrimmed
                          Discard reads in which no adapter was found.
                                                              [default: False]
  --p-max-expected-errors NUMBER
    Range(0, None)        Discard reads that exceed maximum expected
                          erroneous nucleotides.                    [optional]
  --p-max-n NUMBER        Discard reads with more than COUNT N bases. If
    Range(0, None)        COUNT_or_FRACTION is a number between 0 and 1, it is
                          interpreted as a fraction of the read length.
                                                                    [optional]
  --p-quality-cutoff-5end INTEGER
    Range(0, None)        Trim nucleotides with Phred score quality lower
                          than threshold from 5 prime end.        [default: 0]
  --p-quality-cutoff-3end INTEGER
    Range(0, None)        Trim nucleotides with Phred score quality lower
                          than threshold from 3 prime end.        [default: 0]
  --p-quality-base INTEGER
    Range(0, None)        How the Phred score is encoded (33 or 64).
                                                                 [default: 33]
Outputs:
  --o-trimmed-sequences ARTIFACT SampleData[PairedEndSequencesWithQuality]
                          The resulting trimmed sequences.          [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

                    There was a problem with the command:                     
 (1/1?) No such option: --o-untrimmed-sequences Did you mean --o-trimmed-
  sequences?
Usage: qiime demux summarize [OPTIONS]

  Summarize counts per sample for all samples, and generate interactive
  positional quality plots based on `n` randomly selected sequences.

Inputs:
  --i-data ARTIFACT SampleData[SequencesWithQuality |
    PairedEndSequencesWithQuality | JoinedSequencesWithQuality]
                       The demultiplexed sequences to be summarized.
                                                                    [required]
Parameters:
  --p-n INTEGER        The number of sequences that should be selected at
                       random for quality score plots. The quality plots will
                       present the average positional qualities across all of
                       the sequences selected. If input sequences are paired
                       end, plots will be generated for both forward and
                       reverse reads for the same `n` sequences.
                                                              [default: 10000]
Outputs:
  --o-visualization VISUALIZATION
                                                                    [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --example-data PATH  Write example data and exit.
  --citations          Show citations and exit.
  --help               Show this message and exit.

Examples:
  # ### example: demux
  qiime demux summarize \
    --i-data demux.qza \
    --o-visualization visualization.qzv
  

                    There was a problem with the command:                     
 (1/1) Invalid value for '--i-data':
  /projects/p31523/Osburn10BSAll/qiime2-8.15.25-out/demux-trimmed-no-
  untrimmed.qza does not exist.

I have uploaded the metadata file (Osburn10BSS8.All.tsv). I am still stumped.

Thanks for any insights!

Osburn10BSS8.All.tsv (14.2 KB)

Brad

1 Like

Hello!

Looks like now you have duplicated sequence in the metadata file:

Best,

1 Like

Oh wow, another one! Ok, I will fix this and also look a lot closer at the metadata file before re-running and reaching back out. Thank you!

1 Like

That was the last of the duplicate barcodes. The command ran without a hitch. Essentially the error was correct, the readout was misleading. Thank you everyone for the assist!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.