Hi there, I'm trying to rename sequencing files/sample ID's within the files the we received from a vendor. The files are fastq.gz paired end and the vendor returned them to us with the sample IDs really messed up. I know there's a way to change all the files names/sample ID's within the files, but I can't seem to find it.
Hi @jvoelschow,
To rename sample IDs within FASTQ files, you can use the sed command in a shell script. For example:
zcat old_sample.fastq.gz | sed 's/old_sample/new_sample/g' | gzip > new_sample.fastq.gz
This command decompresses the file, replaces the old sample ID with the new one, and recompresses it. For more details, refer to the official GNU sed documentation.
Thanks
Hello Julie,
When fixing raw files, I like using BB Tools, which you can install using conda.
conda install bbmap
rename.sh in=<file> in2=<file2> out=<outfile> out2=<outfile2> prefix=<>
Docs here:
Hi @jvoelschow.
A third way, valid if you will perform the analysis within qiime2, is by importing the sequences in qiime2 by using a manifest file. With this you can associate the correct sample name, in the 'sample-id' column, with the fastq files with the wrong names. So no need to change them at all!
Cheers
I would favor this third approach in order to keep traceability of sample's data in your analytical pipeline, especially when you have many samples and could have the necessity to review it all in the future. IMHO is much better to be explicit at all steps, to prevent wasting time recollecting ancient volatiles memories.