Rename sample IDs within fastq.gz files

Hi there, I'm trying to rename sequencing files/sample ID's within the files the we received from a vendor. The files are fastq.gz paired end and the vendor returned them to us with the sample IDs really messed up. I know there's a way to change all the files names/sample ID's within the files, but I can't seem to find it.

Hi @jvoelschow,

To rename sample IDs within FASTQ files, you can use the sed command in a shell script. For example:

zcat old_sample.fastq.gz | sed 's/old_sample/new_sample/g' | gzip > new_sample.fastq.gz

This command decompresses the file, replaces the old sample ID with the new one, and recompresses it. For more details, refer to the official GNU sed documentation.

Thanks

2 Likes

Hello Julie,

When fixing raw files, I like using BB Tools, which you can install using conda.

conda install bbmap

rename.sh in=<file> in2=<file2> out=<outfile> out2=<outfile2> prefix=<>

Docs here:

2 Likes

Hi @jvoelschow.
A third way, valid if you will perform the analysis within qiime2, is by importing the sequences in qiime2 by using a manifest file. With this you can associate the correct sample name, in the 'sample-id' column, with the fastq files with the wrong names. So no need to change them at all!
Cheers

2 Likes