Demux summarize issue: Not a gzipped file

Hi everyone, I am running qiime 2017.7, i got the error below after running demux summarize:

Plugin error from demux:

Not a gzipped file (b'[+')

Debug info has been saved to /workspace/hrrtph/Leafdisc/tmp/qiime2-q2cli-err-5a8jqr85.log.

The command i used:

qiime demux summarize
--i-data $IMPORT/Paired_end_demux.qza
--o-visualization $IMPORT/Paired_end_demux.qzv

It worked for me before, but now with a new set of data, it does not work. Please help me!

Many thanks,

Toan

Hey @Toan

You are getting this issue because your previous import command spuriously succeeded. You are running a version of QIIME 2 from before we fixed that bug. I would recommend installing the latest version and trying to import your data again. Ideally you should get an informative error (or if you are using a FASTQ Manifest file, it'll gzip for you!).

Let me know how that goes (or if you need help figuring out the import)!

Hi @ebolyen,
Thank you very much for your reply. I have tried with the lastest version qiime 2017.9 to generate "Paired_end_demux.qza" and i got another error:

Traceback (most recent call last):
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/site-packages/q2cli/commands.py", line 218, in call
results = action(**arguments)
File "", line 2, in summarize
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
output_types, provenance)
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/action.py", line 393, in callable_executor
ret_val = callable(output_dir=temp_dir, **view_args)
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/site-packages/q2_demux/_summarize/_visualizer.py", line 142, in summarize
quality_scores, min_seq_len = _subsample_paired(sample_map)
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/site-packages/q2_demux/_summarize/_visualizer.py", line 61, in _subsample_paired
for i, (fseq, rseq) in enumerate(file_pair):
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/site-packages/q2_demux/_demux.py", line 34, in _read_fastq_seqs
for seq_header, seq, qual_header, qual in itertools.zip_longest(*[fh] * 4):
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/gzip.py", line 287, in read1
return self._buffer.read1(size)
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/software/bioinformatics/qiime2-2017.9/lib/python3.5/gzip.py", line 469, in read
uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid block type

Do you have any ideas about this?

Thank you,

Toan

P/S: I have no idea why this command worked well for another set of sample, but not this one. I am wondering what should be different in 2 set of samples, they should be the same fastq files! Please help...

Hi @Toan!

Based on that traceback, it looks like your data is gzipped, but some of the data is corrupted. Could you provide the original import command you used, and then in the directory that has your sequence files does the following command return anything?:

gunzip -t *.fastq.gz

(that will run gunzip in test mode to see if any of the files in that directory are corrupted)

Let me know if you need more specific instructions (with exact filepaths)!

Hi @ebolyen,
here is the command:
qiime tools import
--type SampleData[PairedEndSequencesWithQuality]
--input-path $WORKING/fastq_manifest.csv
--output-path $IMPORT/Paired_end_demux.qza
--source-format PairedEndFastqManifestPhred33
It always work for my other data. But this new data have much more read, its files have very big size (>10KB). Does it affect to the pipeline?

Thankssss

Thanks for the command @Toan!

Could you provide a sample of the fastq_manifest.csv file?

Also, the fastq files that your fastq_mainfest.csv reference, are they already gzipped? If so, could you run the gunzip -t command from above on them?

Thanks!

Hi @ebolyen,

  1. Here is a sample of the fastq_manifest.csv file:

sample-id absolute-filepath direction
RUA225 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA225_S1_L001_R1_001.fastq.gz forward
RUA226 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA226_S2_L001_R1_001.fastq.gz forward
RUA227 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA227_S3_L001_R1_001.fastq.gz forward
RUA228 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA228_S4_L001_R1_001.fastq.gz forward
RUA229 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA229_S5_L001_R1_001.fastq.gz forward
RUA230 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA230_S6_L001_R1_001.fastq.gz forward
RUA231 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA231_S7_L001_R1_001.fastq.gz forward
RUA232 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA232_S8_L001_R1_001.fastq.gz forward
RUA233 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA233_S9_L001_R1_001.fastq.gz forward
RUA234 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA234_S10_L001_R1_001.fastq.gz forward
RUA235 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA235_S11_L001_R1_001.fastq.gz forward
RUA236 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA236_S12_L001_R1_001.fastq.gz forward
RUA237 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA237_S13_L001_R1_001.fastq.gz forward
RUA225 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA225_S1_L001_R2_001.fastq.gz reverse
RUA226 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA226_S2_L001_R2_001.fastq.gz reverse
RUA227 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA227_S3_L001_R2_001.fastq.gz reverse
RUA228 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA228_S4_L001_R2_001.fastq.gz reverse
RUA229 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA229_S5_L001_R2_001.fastq.gz reverse
RUA230 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA230_S6_L001_R2_001.fastq.gz reverse
RUA231 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA231_S7_L001_R2_001.fastq.gz reverse
RUA232 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA232_S8_L001_R2_001.fastq.gz reverse
RUA233 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA233_S9_L001_R2_001.fastq.gz reverse
RUA234 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA234_S10_L001_R2_001.fastq.gz reverse
RUA235 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA235_S11_L001_R2_001.fastq.gz reverse
RUA236 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA236_S12_L001_R2_001.fastq.gz reverse
RUA237 /workspace/hrrtph/Run_Leafdisc_G3/000.raw/RUA237_S13_L001_R2_001.fastq.gz reverse

It was automatically generated by qiime based on my raw data.

  1. My raw data are gzipped files
    When I run the command: gunzip -t $INPUT/*.fastq.gz
    I got this error: gzip: /workspace/hrrtph/Raw_Leafdisc_G3//RUA230_S6_L001_R2_001.fastq.gz: invalid compressed data--format violated

  2. Yesterday i tried to run a set of 4 samples only and it worked. Then I tried again the same workflow, but with more samples and it failed again with the same error:
    Plugin error from demux:

Error -3 while decompressing data: invalid block type

Debug info has been saved to /tmp/qiime2-q2cli-err-jdot4xj2.log.

It looks like the data are so spurious, even the Paired_end_demux.qza was able to be generated, it is still not complete?

I have also tried to skip this step, and go ahead with the filtering step, it also failed. It means that Paired_end_demux.qza is fake.

I am looking forward to hearing from you,

Thank you

Thanks for the info @Toan!

It looks like the reverse read for RUA230 is corrupted then. Can you get a new copy from your sequencing center?

That is correct, as far as QIIME 2 can tell all of your files are compressed and seem to have everything in order, but we don't exhaustively check that as it would make every command much slower. Instead we only peek of the first few kilobytes since usually you can detect the wrong format with only a little bit of data. Your problem probably happens at the end of the file, so demux summarize is the first command you run that actually reads the file in its entirety.

We do have a validate command that we're hoping we can use for these kinds of situations in the future. It should be available later this month in the 2017.10 release.

Hopefully you can get an uncorrupted copy of RUA230_S6_L001_R2_001.fastq.gz, let me know how importing goes afterwards!

An off-topic reply has been split into a new topic: What are the numbers in taxa barplot CSV file?

Please keep replies on-topic in the future.

Marking @ebolyen's post above as the solution. @Toan followed up to confirm that the issue was due to a corrupted file. I split off that post into a new topic because there was a follow-up question that is better suited for its own forum topic. Thanks!

QIIME 2 2017.10 was just released. It adds a new command qiime tools validate which should detect and give a better error for the corrupted fastq file!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.