Splitting fastq files

Hi everyone,

I’m trying to import Casava 1.8 paired end reads into QIIME2. However, when these samples were submitted for sequencing, there were 2 errors (duplicate sample IDs) in the mapping file that resulted in 4 samples being combined in 2 fastq files (i.e., there are 2 fastq files when there should be 4, 2 samples per fastq). Within both of these 2 fastq files, the samples have their original, unique barcodes. Here’s an example of one of the fastq files:


@M00161:110:000000000-CGG2R:1:1101:11873:1171 1:N:0:TACGCTGC+GTAAGGAG



@M00161:110:000000000-CGG2R:1:1101:15918:1176 1:N:0:CGGAGCCT+GTAAGGCG


Does anyone know how I can separate these 2 fastq files in a way that allows me to analyze them with the rest of my sequencing library? Maybe by barcode? If I remove these samples, I am able to import the rest of the library.


Welcome to the QIIME 2 forum, @paaigehansen!

I have re-classified this as “other bioinformatics tools” because this is a technical question about something outside of QIIME 2.

This calls for a bit of custom code to separate those files. Fortunately, I think some simple grep will do the job here, since the barcode info in the header lines provides unique information.

This should do it, but no guarantees this will work, I am just cooking this up from memory and have not tested. You can run this once for each file, just pop this into your terminal:

grep -A 3  'N:0:TACGCTGC+GTAAGGAG' put-path-to-original-file-here.fastq > put-path-to-new-file-for-barcode-TACGCTGC+GTAAGGAG-here.fastq

This will grab only lines that contain whatever barcode you put in quotes, and the following 3 lines.

Good luck!


I think this did the trick! Thanks for the suggestion!