Need to remove barcodes not in my mapping file

Hello,

I have imported my raw data (forward sequences and barcodes) using the single-end EMP format. I am attempting to demultiplex using a mapping file whose barcodes are not aligning correctly to the barcodes in the emp-single-end-sequences.qza file.

I used the cat command to view all of the barcodes in my barcodes.fastq file. It seems that there are some barcodes in the fastq file that are not in my mapping file.

Is there a way to remove those specific barcodes that are not in my mapping file?

There was a strange thing that happened with the barcodes at some point, where an extra ‘A’ was added to all barcodes. Therefore, I know which barcodes do not belong, because they do not have an ‘A’ at the end. Is there a way to indicate that barcodes not ending in ‘A’ should be removed?

This extra ‘A’ caused there to be 13, instead of 12, Golay barcodes, although I think I got around this by specifying --p-no-golay-error-correction.

Here is the code I’ve used up until the demultiplex step (I included a sequences.fastq.gz file with the forward reads and a barcodes.fastq.gz file with barcodes in the emp-single-end-sequences folder):

qiime tools import
–type EMPSingleEndSequences
–input-path /proj/carrlab/Daria/BEGIN/Controls/Kleiman/emp-single-end-sequences
–output-path emp-single-end-sequences.qza

sbatch -p general -N 1 -t 05-00:00:00 --mem=25g -n 2
qiime demux emp-single
–i-seqs emp-single-end-sequences.qza
–m-barcodes-file Mapping_file_newbarcodes.txt
–m-barcodes-column BarcodeSequence
–o-per-sample-sequences demux.qza
–o-error-correction-details error.qza
–p-no-golay-error-correction
–verbose

And here is the error message I’m getting:

Traceback (most recent call last):
File “/nas/longleaf/apps/qiime2/2019.4/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py”, line 311, in call
results = action(**arguments)
File “</nas/longleaf/apps/qiime2/2019.4/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-429>”, line 2, in emp_single
File “/nas/longleaf/apps/qiime2/2019.4/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/nas/longleaf/apps/qiime2/2019.4/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 365, in callable_executor
output_views = self._callable(**view_args)
File “/nas/longleaf/apps/qiime2/2019.4/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_demux/_demux.py”, line 270, in emp_single
for i, (barcode_record, sequence_record) in enumerate(seqs, start=1):
File “/nas/longleaf/apps/qiime2/2019.4/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_demux/_demux.py”, line 127, in iter
_trim_id(sequence_header.id)))
ValueError: Mismatched sequence ids: ACTAAGACGGACTACTAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCCTCAGCGTCAGTTTCAGTCCAGAAAGCCGCCTTCGCCACCGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCGCTTCCCTCTCCTGTACTCTAGCTATCCAGTTTTGAATGCACCCCCCAGGTTAAGCCCGGGTATTTCACATCCAACTTAAATTGCCGCCTACGCACCCTTTACGCCC and HWI-M01825:194:000000000-AJDP3:1:1104:22078:24515

Plugin error from demux:

Mismatched sequence ids: ACTAAGACGGACTACTAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCCTCAGCGTCAGTTTCAGTCCAGAAAGCCGCCTTCGCCACCGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCGCTTCCCTCTCCTGTACTCTAGCTATCCAGTTTTGAATGCACCCCCCAGGTTAAGCCCGGGTATTTCACATCCAACTTAAATTGCCGCCTACGCACCCTTTACGCCC and HWI-M01825:194:000000000-AJDP3:1:1104:22078:24515

See above for debug info.

I am not finding the sequences above in my mapping file, so I assume the above string refers to all of the barcodes that are not present in my mapping file?

Any help is greatly appreciated!

Hi @dadaria - sorry for the slow reply!

The demux process will automatically filter out any reads that don't match any barcodes in the mapping file, so this should already be happening for you.

Strange! Maybe you should talk to your sequencing center?

This error is unrelated to the concerns you have brought up above. This error is telling you that your barcode sequences are not in the same order as your forward sequences. The way this demux step works is by iterating through each barcode record, one at a time, and checking to see if the barcode sequence matches any samples in your sample metadata file. If it does, it copies the corresponding record out of the forward sequences file into a new file for that particular sample. A sequence record is made up of 4 lines: the sequence id, the sequence, a delimiter line, and the sequence quality scores. The error message you have shared here looks like something might be wrong with your barcode sequences, because for some reason you have sequence data in a line that should have a sequence id in it.

Taking a step back, it sounds like there are a few issues with the data here that are going to cause you a lot of problems downstream, maybe you should check in with the sequencing provider and make sure that there weren't errors with producing the data, or perhaps there was an issue when transferring the files, etc.

Keep us posted!

:qiime2:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.