Deblur and HiSeq

Hello,

I am trying to process HiSeq files and am running to issues with Deblur.

The command is:

qiime deblur denoise-16S \
  --i-demultiplexed-seqs  \
  --p-trim-length 100 \
  --o-representative-sequences \
  --o-table \
  --p-sample-stats \
  --p-min-reads 1 \
  --p-min-size 1 \
  --o-stats

And the error has been:

Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-fk_we11v/d5f60d63-1a8b-44bc-bfcf-7a45e36acac7/data', '--output-dir', '/tmp/tmp5jf06m5k', '--mean-error', '0.005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '1', '--min-size', '1', '--jobs-to-start', '1', '-w', '--keep-tmp-files']' **returned non-zero exit status 1**.

Regards,
Tyler

Hi @Tyler_Carrier,

We need to see thee full error traceback to understand where and how it failed. Please share your log file or run the command with --verbose and print the output.

Best,
Justine

1 Like

Hi Justine

Here is the full error traceback.


(base) -bash-4.2$ cat SpongeLarvae_Deblur-2051728.out
Traceback (most recent call last):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/bin/deblur", line 684, in <module>
    deblur_cmds()
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/bin/deblur", line 632, in workflow
    threads_per_sample=threads_per_sample)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/deblur/workflow.py", line 833, in launch_workflow
    left_trim_len=left_trim_length):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/deblur/workflow.py", line 130, in trim_seqs
    for label, seq in input_seqs:
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/deblur/workflow.py", line 99, in sequence_generator
    for record in skbio.read(input_fp, format=format, **kw):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 506, in <genexpr>
    return (x for x in itertools.chain([next(gen)], gen))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 344, in _fastq_to_generator
    seq, qual_header = _parse_sequence_data(fh, seq_header)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 481, in _parse_sequence_data
    _blank_error("before '+'")
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 473, in _blank_error
    raise FASTQFormatError(error_string)
skbio.io._exception.FASTQFormatError: Found blank or whitespace-only line before '+' in FASTQ file
Traceback (most recent call last):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "</apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/decorator.py:decorator-gen-432>", line 2, in denoise_16S
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/q2_deblur/_denoise.py", line 96, in denoise_16S
    hashed_feature_ids=hashed_feature_ids)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/q2_deblur/_denoise.py", line 163, in _denoise_helper
    subprocess.run(cmd, check=True)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-ntiq_i0w/d5f60d63-1a8b-44bc-bfcf-7a45e36acac7/data', '--output-dir', '/tmp/tmpnpsnjdgb', '--mean-error', '0.005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '1', '--min-size', '1', '--jobs-to-start', '1', '-w', '--keep-tmp-files']' returned non-zero exit status 1.

Plugin error from deblur:

  Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-ntiq_i0w/d5f60d63-1a8b-44bc-bfcf-7a45e36acac7/data', '--output-dir', '/tmp/tmpnpsnjdgb', '--mean-error', '0.005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '1', '--min-size', '1', '--jobs-to-start', '1', '-w', '--keep-tmp-files']' returned non-zero exit status 1.

This all comes after 12+ hours of the code running successfully.

Regards,
Tyler

Hi @Tyler_Carrier, it looks like there is either a malformed per-sample FASTQ file, or (more likely) and empty FASTQ file. Can you verify all of the samples have data? @fedarko noted (on an internal discussion) the following clue in the traceback:

File “/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py”, line 473, in _blank_error
raise FASTQFormatError(error_string)
skbio.io._exception.FASTQFormatError: Found blank or whitespace-only line before ‘+’ in FASTQ file

Best,
Daniel

1 Like

Hi Daniel,

Yes - there are data in the files… some more than others. I think it may be a formatting issue… this is for a meta-analysis and the files were downloaded from multiple sources.

Hi, again, Daniel,

Provided that I suspect this is a formatting issue. What approach should be taken exactly to transform these files?

Tyler

Hi @Tyler_Carrier,

First thing would be to determine which file appears to be a malformed fastq. Something like the following might work:

$ find path/to/files/*.fastq.gz -exec zgrep -H -E -m 1 -n "^\s*$" {} \;

That command will, for each fastq.gz file, print the name of any file with an empty line or line composed entirely of whitespace (and the corresponding line number). I think that will flag the file given the message in the traceback.

In case you’re not familiar with the commands, find is a swiss army knife for getting a selection of files and has a mode that allows you to apply a command to each matching (via -exec). For the command itself, zgrep allows you to search for a pattern using regular expressions. This is the same as grep except it will automatically handle gzip'd files. The arguments to zgrep, in order, mean to print the name of a matching file (-H), use extended regular expressions (-E, same as egrep), stop after matching a single line (-m 1), print the line number (-n). The regular expression ("^\s*$"), can be read as match from the start of the line, zero or more whitespace characters, to the end of the line. And finally, the {} \; piece is syntax for find to pass the file to the -exec argument and to denote the end of the command.

Best,
Daniel

1 Like

Hi Daniel,

Thank you for the code and the explanation. Unfortunately or fortunately, no samples were flagged.

Tyler

Ah, so it’s a “fun” problem :slight_smile: Let’s use the underlying parsing logic directly to see if we can tease out the problematic file or files.

$ for f in path/to/files/*.fastq.gz
do 
    echo "*** START: $f ***"
    python -c "import skbio; skbio.read('$f', format='fastq', variant='illumina1.8')"
    echo "*** END: $f ***"
done

The above should yield a similar traceback message as before w.r.t. to the FASTQFormatError (pasted below), but also provide the granularity as to which file is problematic.

File “/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py”, line 473, in _blank_error
raise FASTQFormatError(error_string)
skbio.io._exception.FASTQFormatError: Found blank or whitespace-only line before ‘+’ in FASTQ file

What we’re doing in the code above is using a bash forloop to iterate over all of the fastq.gz files in a directory. We’re then printing each filename to standard out. Next, we’re using Python inline (-c), and running a small program which uses the scikit-bio FASTQ parser to read the file. This is the same parser used by Deblur. We’re telling scikit-bio to interpret the data as an illumina1.8 variant, which should be fine, although there is a chance we’ll need to try illumina1.3 as well – this only affects how the PHRED scores are interpreted, and getting it wrong will produce a distinct parsing error from the one already encountered. Finally, we’re printing the filename again to standard out. If all works well, we should get something like:

*** START: foo.fastq.gz ***
*** END: foo.fastq.gz ***
*** START: bar.fastq.gz ***
Traceback (most recent call last):
... a lot of stuff ...
*** END: bar.fastq.gz ***

Want to give that a try and report back?

Best,
Daniel

2 Likes

Hi Daniel,

There were no errors when using “illumina1.8” but every file was an error when using “illumina1.3”:

*** START:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 1161, in read
    **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 506, in read
    return (x for x in itertools.chain([next(gen)], gen))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 354, in _fastq_to_generator
    qual_header)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 523, in _parse_quality_scores
    phred_offset=phred_offset))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/_base.py", line 35, in _decode_qual_to_phred
    % (phred_range[0], phred_range[1]))
ValueError: Decoded Phred score is out of range [0, 62].

*** END:

Regards,
Tyler