Importing Casava 1.8 paired-end demultiplexed fastq

Greetings!

I have a problem importing Casava 1.8 paired-end demultiplexed fastq files.

when I put the following command in qiime2-2019.7
qiime tools import
–type ‘SampleData[PairedEndSequencesWithQuality]’
–input-path casava-18-paired-end-demultiplexed
–input-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path demux-paired-end.qza

An unexpected error has occurred:

Error -3 while decompressing data: invalid block type

See above for debug info.

Because these files are not artifact or vizualization files, I couldnt run the qiime validate command.
Also I added some files for your reference

RN_M01_S89_L001_R1_001.fastq (3.9 KB) RN_M2E_S81_L001_R1_001.fastq.gz (4.9 MB) RN_M2E_S81_L001_R2_001.fastq.gz (7.4 MB)

Please help me in this regard

Hi @srini,

It looks like there’s some issue with the file compression. My first recommendation would be look at consistency. I notice that one of the files you uploaded is unzipped (RN_M01_S89_L001_R1_001.fastq) where as the other two are gzipped. Im not sure if thats the only issue, but it might be complicating things, as an initial guess.

Best,
Justine

2 Likes

Thank you.

But that is not the problem. I just uploaded as it is. However, I have unzipped all the files using unzip * and again zipped all the files through gzip *

And Yes, the problem might be in compression also, since I received the following error message when trying to unzip the files

Archive: RN_M010_S80_L001_R1_001.fastq.gz
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of RN_M010_S80_L001_R1_001.fastq.gz or
RN_M010_S80_L001_R1_001.fastq.gz.zip, and cannot find RN_M010_S80_L001_R1_001.fastq.gz.ZIP, period.

And When I zipped it using gzip *

gzip: RN_M010_S80_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M010_S80_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M01_S89_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M01_S89_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M05cii_S85_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M05cii_S85_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M1E_S84_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M1E_S84_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M2E_S81_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M2E_S81_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M4BE_S83_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M4BE_S83_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M7E_S82_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M7E_S82_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_Mo3ii_S86_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_Mo3ii_S86_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO4A_S88_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO4A_S88_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO7_S87_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO7_S87_L001_R2_001.fastq.gz already has .gz suffix – unchanged

Can you please suggest the solution.

Hi @srini,

The .gz suffix means they’ve been gzipped, so you need to run gunzip, not unzip.
I would double check the zipping, and then see if you can re-run the command.

If that doesn’t work, I would try a subset of the files to see if you can pinpoint which version is having compression issues.

Best,
Justine

Hi,

The gunzip * is giving the result

gzip: RN_M010_S80_L001_R2_001.fastq.gz: invalid compressed data–format violated

I gzipped again with gzip *,the result is

gzip: RN_M010_S80_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M01_S89_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M01_S89_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M05cii_S85_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M05cii_S85_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M1E_S84_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M1E_S84_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M2E_S81_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M2E_S81_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M4BE_S83_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M4BE_S83_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M7E_S82_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_M7E_S82_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_Mo3ii_S86_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_Mo3ii_S86_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO4A_S88_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO4A_S88_L001_R2_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO7_S87_L001_R1_001.fastq.gz already has .gz suffix – unchanged
gzip: RN_MO7_S87_L001_R2_001.fastq.gz already has .gz suffix – unchanged

When importing using cassava paired end command in qiime2- 2019.7

Traceback (most recent call last):
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/builtin/tools.py”, line 154, in import_data
view_type=input_format)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/result.py”, line 241, in import_data
validate_level=‘max’)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/result.py”, line 267, in _from_view
result = transformation(view, validate_level)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/transform.py”, line 68, in transformation
self.validate(view, validate_level)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/transform.py”, line 143, in validate
view.validate(level)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/plugin/model/directory_format.py”, line 171, in validate
getattr(self, field)._validate_members(collected_paths, level)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/plugin/model/directory_format.py”, line 101, in _validate_members
self.format(path, mode=‘r’).validate(level)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/plugin/model/file_format.py”, line 24, in validate
self.validate(level)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_types/per_sample_sequences/_format.py”, line 279, in validate
self._check_n_records(record_count_map[level])
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_types/per_sample_sequences/_format.py”, line 239, in check_n_records
for i, record in file
:
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/gzip.py”, line 289, in read1
return self._buffer.read1(size)
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/apps/miniconda3/envs/qiime2-2019.7/lib/python3.6/gzip.py”, line 471, in read
uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid block type

An unexpected error has occurred:

Error -3 while decompressing data: invalid block type

See above for debug info.

The problem should be in compression. But what is the solution? Please suggest me. The data is 3 years old and the sequencer dont have that.

Hi @srini,

I think you’ve got your answer here:

It looks like something about the compression in this file is wrong. Try excluding this file and see if that solves your problem. (Sorry its so iterative, AFAIK, there isnt an easy way to trouble shoot this immediately.)

Best,
Justine

Thanks again

Yes, I deleted that file. and also I did unzipped through 7-Zip in windows and imported only the FASTQ files and then zipped through gzip command. That zipping is successful after some times.
But even after that, it shows the following error message while importing.

There was a problem importing casava-18-paired-end-demultiplexed:

casava-18-paired-end-demultiplexed/RN_MO4A_S88_L001_R2_001.fastq.gz is not a(n) FastqGzFormat file:

Invalid separator on line 16535

Also, I deleted this file as well and tried to import the files. But command by command it shows the same error message with different files.

There was a problem importing casava-18-paired-end-demultiplexed:

casava-18-paired-end-demultiplexed/RN_MO4A_S88_L001_R2_001.fastq.gz is not a(n) FastqGzFormat file:

Invalid separator on line 16535

So, all the files are having a problem!

Hi @srini,

In that case, I would recommend that you contact the person who supplied the files (i.e. your sequencing center) to make sure that you have the correct, uncorrupted version. That’s probably the first (best) bet. It’s the most likely path to get you the data you need in the correct format.

Best,
Justine

1 Like

Hi Jwdebelius,

Thank you very much for troubleshooting the data. You are correct. The metagenome data we has was corrupted due to some reasons. May be downloading error / zipping error.

Fortunately, the sequencer hold the original data which is demultiplexed and send us the data again. This time we could import into qiime2-2019.7 with cassava paired end sequence.

Thank you, I greatly appreciate your time, effort and knowledge.

2 Likes