How to convert .txt mapping file to barcode.fastq.gz


Can I convert and .txt mapping file to a fastq.gz for my barcodes? Samples are multiplexed and I am trying to use the EMPPairedEndSequences pipeline in the Atacama tutorial to demultiplex. I have sequences in fastq.gz format.

Headers in current mapping file are:

#SampleID	BarcodeSequence	LinkerPrimerSequence	ForwardPrimer	ForwardPrimerRevComp	ReversePrimer	ReversePrimerRevComp	Run	Amplicon	Description


Hi @cdevera! We aren’t exactly sure what you are trying to do here. For the EMPPairedEndSequences, you should have three fastq.gz files, one with forward reads, one with reverse reads, and one with barcode reads. Is this what you have available to you? If not, can you please describe what data you do have on hand? Thanks!

Hi @thermokarst,

I only have multiplexed fastq.gz files for forward and reverse reads of an Illumina sequencing run. I also have an index file in a fastq format, but that does not have associated sample IDs. My main question is how can I convert a mapping .txt file (provided by the vendor) that has the barcode and sample IDs to a fastq.gz format to be used for the barcode file in the EMPPairedEndSequences pipeline? Follow up, can I use the index file to run as the barcode (initial run of EMPPairedEndSequences seemed to work) and tag sequences with a sample ID by using the mapping file as a metadata file? After converting from .txt to .tsv, of course.

Perfect! The index file is (or should be) your barcodes! :tada:

Here are some steps to hopefully get you moving in the right direction:

  1. Rename your forward reads to forward.fastq.gz and your reverse reads to reverse.fastq.gz
  2. gzip your barcodes $ gzip whatever-your-index-file-is-named.fastq
  3. Step 2 will create a new file, whatever-your-index-file-is-named.fastq.gz (note the “gz” at the end!)
  4. Rename that new file to barcodes.fastq.gz
  5. Follow along with the EMP Paired End Import Tutorial (skip downloading the test data).

You don’t need to do anything with your mapping file in order to import these data, you should only need the three files referenced above (forward.fastq.gz, reverse.fastq.gz, barcodes.fastq.gz) in order to import your data. Where you will need to use the sample metadata file is when demultiplexing, but you shouldn’t need to do anything special with that file, it should work for you as-is.

Please let us know if you get stuck! Good luck!!

1 Like

Thank you, @thermokarst! That seemed to get everything working. I was confused because the data that was initially given to me did not include the index file so I was trying to get essentially a metadata file to work as a barcode.fastq.gz.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.