Trouble compressing fastq files to fastq.gz and importing to qiime2

Hello,

I am new to QIIME2 and fairly unfamiliar with the Linux operating system. I received a compressed file of multiplexed single-end sequences with barcodes in sequences and I am trying to follow the importing tutorial for QIIME2 on a Ubuntu virtual computer.

The first thing I did was unzip the original .tar.bz2 file in Ubuntu and moved the unzipped fastq file to my working directory. I then used the gzip command in the terminal to create the fastq.gz file:

$ gzip sequences.fastq

Next, I created the metadata.txt file and used the following command to create the .qzv file.

$ qiime metadata tabulate 
--m-input-file mymetadata.txt 
--o-visualization tabulated-sample-metadata.qzv

However, when I try to import the fastq.gz file by executing the following command, I get an error:

$ qiime tools import 
--type MultiplexedSingleEndBarcodeInSequence --input-path sequences.fastq.gz 
--output-path multiplexed-seqs.qza
There was a problem importing sequences.fastq.gz:

  sequences.fastq.gz is not a(n) FastqGzFormat file:

  Missing separator for record beginning on line 57741453

I have deleted the fastq.gz file and started over by recompressing the original fastq file again and I always get the same error about the fastq.gz file not being a FastqGzFormat file, although the description of the error after the colon has mentioned other problems in previous attempts ("missing separator for record beginning on line ....").

I went through the import tutorial and I was able to get the example file (muxed-se-barcode-in-seq.fastq.gz) to import just fine. I get the impression that the problem is with the original fastq file I was given, or something that happens to it when I unzip it. The sequence data was from an Ion Torrent sequencer and the original file that was sent to me had a .tar.bz2 file extension. I'm really not sure what the issue might be and the folks at the lab that sequenced my data are not able to help me either. They are familiar with QIIME but have not used QIIME2 to process their NGS data recently.

Any help you can offer would be greatly appreciated!

Andrew

Hello Andrew,

Welcome to the forums! :qiime2:

You are making good progress and are on the right track. The core error
Missing separator for record beginning on line 57741453
tells me that the file is corrupted or did not fully download. This is why recompressing the original fastq does not fix the problem.

Try downloading and extracting that .tar.bz2 file again and see if it helps.

You could also take a look at the lines causing the error:

gzip -dc sequences.fastq.gz | sed -n '57741450,57741455p'

Let us know what you try next!

1 Like

I did some research into extracting a tar.bz2 compressed file and I figured out how to extract it properly in Ubuntu. The process I used is as follows:

  1. Download the compressed sequencing file and extract the fastq file to QIIME2 working directory.

#The compressed sequencing file should have a .tar.bz2 file extension.

#Download and save to the shared file folder in the QIIME2 virtual computer.

#From the QIIME2 virtual computer, move the compressed sequencing file to the working directory (Documents folder, in my case)

#open the terminal and extract the fastq file using the following commands:

$ tar -xvjf filename.tar.bz2

I found the command help on the following website after googling "extracting tar.bz2 files":

How to Extract | Unzip Tar Bz2 (.tar.bz2 | .tbz2) Files on Ubuntu 18.04 - Website for Students

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.