Rename sample IDs within fastq.gz files

Hi there, I'm trying to rename sequencing files/sample ID's within the files the we received from a vendor. The files are fastq.gz paired end and the vendor returned them to us with the sample IDs really messed up. I know there's a way to change all the files names/sample ID's within the files, but I can't seem to find it.

Hi @jvoelschow,

To rename sample IDs within FASTQ files, you can use the sed command in a shell script. For example:

zcat old_sample.fastq.gz | sed 's/old_sample/new_sample/g' | gzip > new_sample.fastq.gz

This command decompresses the file, replaces the old sample ID with the new one, and recompresses it. For more details, refer to the official GNU sed documentation.



Hello Julie,

When fixing raw files, I like using BB Tools, which you can install using conda.

conda install bbmap in=<file> in2=<file2> out=<outfile> out2=<outfile2> prefix=<>

Docs here:


Hi @jvoelschow.
A third way, valid if you will perform the analysis within qiime2, is by importing the sequences in qiime2 by using a manifest file. With this you can associate the correct sample name, in the 'sample-id' column, with the fastq files with the wrong names. So no need to change them at all!