Deblur and HiSeq

Hello,

I am trying to process HiSeq files and am running to issues with Deblur.

The command is:

qiime deblur denoise-16S \
  --i-demultiplexed-seqs  \
  --p-trim-length 100 \
  --o-representative-sequences \
  --o-table \
  --p-sample-stats \
  --p-min-reads 1 \
  --p-min-size 1 \
  --o-stats

And the error has been:

Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-fk_we11v/d5f60d63-1a8b-44bc-bfcf-7a45e36acac7/data', '--output-dir', '/tmp/tmp5jf06m5k', '--mean-error', '0.005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '1', '--min-size', '1', '--jobs-to-start', '1', '-w', '--keep-tmp-files']' **returned non-zero exit status 1**.

Regards,
Tyler

Hi @Tyler_Carrier,

We need to see thee full error traceback to understand where and how it failed. Please share your log file or run the command with --verbose and print the output.

Best,
Justine

1 Like

Hi Justine

Here is the full error traceback.


(base) -bash-4.2$ cat SpongeLarvae_Deblur-2051728.out
Traceback (most recent call last):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/bin/deblur", line 684, in <module>
    deblur_cmds()
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/bin/deblur", line 632, in workflow
    threads_per_sample=threads_per_sample)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/deblur/workflow.py", line 833, in launch_workflow
    left_trim_len=left_trim_length):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/deblur/workflow.py", line 130, in trim_seqs
    for label, seq in input_seqs:
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/deblur/workflow.py", line 99, in sequence_generator
    for record in skbio.read(input_fp, format=format, **kw):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 506, in <genexpr>
    return (x for x in itertools.chain([next(gen)], gen))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 344, in _fastq_to_generator
    seq, qual_header = _parse_sequence_data(fh, seq_header)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 481, in _parse_sequence_data
    _blank_error("before '+'")
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 473, in _blank_error
    raise FASTQFormatError(error_string)
skbio.io._exception.FASTQFormatError: Found blank or whitespace-only line before '+' in FASTQ file
Traceback (most recent call last):
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "</apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/decorator.py:decorator-gen-432>", line 2, in denoise_16S
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/q2_deblur/_denoise.py", line 96, in denoise_16S
    hashed_feature_ids=hashed_feature_ids)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/q2_deblur/_denoise.py", line 163, in _denoise_helper
    subprocess.run(cmd, check=True)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-ntiq_i0w/d5f60d63-1a8b-44bc-bfcf-7a45e36acac7/data', '--output-dir', '/tmp/tmpnpsnjdgb', '--mean-error', '0.005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '1', '--min-size', '1', '--jobs-to-start', '1', '-w', '--keep-tmp-files']' returned non-zero exit status 1.

Plugin error from deblur:

  Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-ntiq_i0w/d5f60d63-1a8b-44bc-bfcf-7a45e36acac7/data', '--output-dir', '/tmp/tmpnpsnjdgb', '--mean-error', '0.005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '1', '--min-size', '1', '--jobs-to-start', '1', '-w', '--keep-tmp-files']' returned non-zero exit status 1.

This all comes after 12+ hours of the code running successfully.

Regards,
Tyler

Hi @Tyler_Carrier, it looks like there is either a malformed per-sample FASTQ file, or (more likely) and empty FASTQ file. Can you verify all of the samples have data? @fedarko noted (on an internal discussion) the following clue in the traceback:

File “/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py”, line 473, in _blank_error
raise FASTQFormatError(error_string)
skbio.io._exception.FASTQFormatError: Found blank or whitespace-only line before ‘+’ in FASTQ file

Best,
Daniel

1 Like

Hi Daniel,

Yes - there are data in the files… some more than others. I think it may be a formatting issue… this is for a meta-analysis and the files were downloaded from multiple sources.

Hi, again, Daniel,

Provided that I suspect this is a formatting issue. What approach should be taken exactly to transform these files?

Tyler

Hi @Tyler_Carrier,

First thing would be to determine which file appears to be a malformed fastq. Something like the following might work:

$ find path/to/files/*.fastq.gz -exec zgrep -H -E -m 1 -n "^\s*$" {} \;

That command will, for each fastq.gz file, print the name of any file with an empty line or line composed entirely of whitespace (and the corresponding line number). I think that will flag the file given the message in the traceback.

In case you’re not familiar with the commands, find is a swiss army knife for getting a selection of files and has a mode that allows you to apply a command to each matching (via -exec). For the command itself, zgrep allows you to search for a pattern using regular expressions. This is the same as grep except it will automatically handle gzip'd files. The arguments to zgrep, in order, mean to print the name of a matching file (-H), use extended regular expressions (-E, same as egrep), stop after matching a single line (-m 1), print the line number (-n). The regular expression ("^\s*$"), can be read as match from the start of the line, zero or more whitespace characters, to the end of the line. And finally, the {} \; piece is syntax for find to pass the file to the -exec argument and to denote the end of the command.

Best,
Daniel

1 Like

Hi Daniel,

Thank you for the code and the explanation. Unfortunately or fortunately, no samples were flagged.

Tyler

Ah, so it’s a “fun” problem :slight_smile: Let’s use the underlying parsing logic directly to see if we can tease out the problematic file or files.

$ for f in path/to/files/*.fastq.gz
do 
    echo "*** START: $f ***"
    python -c "import skbio; skbio.read('$f', format='fastq', variant='illumina1.8')"
    echo "*** END: $f ***"
done

The above should yield a similar traceback message as before w.r.t. to the FASTQFormatError (pasted below), but also provide the granularity as to which file is problematic.

File “/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py”, line 473, in _blank_error
raise FASTQFormatError(error_string)
skbio.io._exception.FASTQFormatError: Found blank or whitespace-only line before ‘+’ in FASTQ file

What we’re doing in the code above is using a bash forloop to iterate over all of the fastq.gz files in a directory. We’re then printing each filename to standard out. Next, we’re using Python inline (-c), and running a small program which uses the scikit-bio FASTQ parser to read the file. This is the same parser used by Deblur. We’re telling scikit-bio to interpret the data as an illumina1.8 variant, which should be fine, although there is a chance we’ll need to try illumina1.3 as well – this only affects how the PHRED scores are interpreted, and getting it wrong will produce a distinct parsing error from the one already encountered. Finally, we’re printing the filename again to standard out. If all works well, we should get something like:

*** START: foo.fastq.gz ***
*** END: foo.fastq.gz ***
*** START: bar.fastq.gz ***
Traceback (most recent call last):
... a lot of stuff ...
*** END: bar.fastq.gz ***

Want to give that a try and report back?

Best,
Daniel

2 Likes

Hi Daniel,

There were no errors when using “illumina1.8” but every file was an error when using “illumina1.3”:

*** START:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 1161, in read
    **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 506, in read
    return (x for x in itertools.chain([next(gen)], gen))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 354, in _fastq_to_generator
    qual_header)
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/fastq.py", line 523, in _parse_quality_scores
    phred_offset=phred_offset))
  File "/apps/pkg/qiime2/2019.1/rhel7_u5/gnu/lib/python3.6/site-packages/skbio/io/format/_base.py", line 35, in _decode_qual_to_phred
    % (phred_range[0], phred_range[1]))
ValueError: Decoded Phred score is out of range [0, 62].

*** END:

Regards,
Tyler

Hi @Tyler_Carrier,

Sorry for the delayed reply – we needed to have a little internal chat about this. There was a pretty large amount of change to the validation on import of artifacts in recent versions of QIIME 2, and @thermokarst noted you’re using what looks like a reasonably old version. Any chance you might be able to take a stab at importing with QIIME 2 2020.8? Alternatively, it may be necessary to either (a) run deblur directly so the temporary and log files can be captured for examination or (b) split the set of input files into small batches to identify the problematic files. Happy to provide advice in either direction!

Best,
Daniel

Hi Daniel,

Thanks for getting back to me. I have figured out the issue(s) in the meantime. The samples came from several studies – in order to perform the meta-analysis – and the files had to be reformatted a bit from the NCBI style to something that was Deblur compatible. I did the following:

CHANGE DIRECTORY AND UNZIP

cd IMPORTED-DATA_TRIMMED_QC25
gunzip *

THE FOLLOWING FOR EACH DATA FILE

REPLACE ALL WHITE SPACES

tr ’ ’ ‘_’ <DATA-FILE.fastq >DATA-FILE_MODIFIED.fastq

RENAME EACH LINE HEADER LINE WITHIN FILES

awk ‘/^@/{print “@SPECIES-NAME_” ++i; next}{print}’ DATA-FILE_MODIFIED.fastq > DATA-FILE_MODIFIED2.fastq

REMOVE EXCESS TEXT FROM QUALITY INFORMATION LINES

awk ‘/^+/{print “+”; next}{print}’ DATA-FILE_MODIFIED2.fastq > DATA-FILE_MODIFIED3.fastq

REMOVE NON-FINAL FILES

rm DATA-FILE.fastq
rm DATA-FILE_MODIFIED.fastq
rm DATA-FILE_MODIFIED2.fastq

RENAME FINAL FILE BACK TO ORIGINAL

mv DATA-FILE_MODIFIED2.fastq DATA-FILE.fastq

CHECK FILES FOR ANY BLANK LINE

grep -c “^$” DATA-FILE.fastq

IF NEEDED: REMOVE BLANK LINES BY HAND

nano DATA-FILE.fastq

ZIP BACK UP FILES

gzip *

It has been smooth sailing since then.

Regards,
Tyler

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.