UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Hello, I saw there was a similar topic created by somebody but wasn’t resolved perhaps. (Manifest File: Invalid text encoding)

In my case, I am using a demultiplexed fastq file, which I zipped using gzip, and then tried to import using the following command, but couldn’t succeed…

qiime tools import --type ‘SampleData[SequencesWithQuality]’ --input-path ATCC/PGM052_QIIME2.fastq.gz --output-path demux.qza --source-format SingleEndFastqManifestPhred33

Traceback (most recent call last):
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/tools.py”, line 116, in import_data
view_type=source_format)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 214, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 239, in _from_view
result = transformation(view)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py”, line 57, in transformation
self.validate(view)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py”, line 131, in validate
view.validate(‘min’)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py”, line 32, in validate
if not self.sniff():
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py”, line 35, in sniff
line = fh.readline()
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/codecs.py”, line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte

An unexpected error has occurred:

‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte

See above for debug info.

Also since you have asked in the previous unresolved topic, I have ran the following commands too and got respective outputs (see below).

echo $LC_ALL

echo $LANG
en_US.UTF-8

Please help. Thanks in advance.

I’m getting a similar error… but this is not related to metadata file. I’m trying to import a demultiplexed fastq file in QIIME2 using following command. Please help me resolve this issue.

qiime tools import --type ‘SampleData[SequencesWithQuality]’ --input-path PGM052_ATCC_Std.fastq.gz --output-path demux.qza --source-format SingleEndFastqManifestPhred33

Traceback (most recent call last):
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/tools.py”, line 116, in import_data
view_type=source_format)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 214, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 239, in _from_view
result = transformation(view)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py”, line 57, in transformation
self.validate(view)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py”, line 131, in validate
view.validate(‘min’)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py”, line 32, in validate
if not self.sniff():
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py”, line 35, in sniff
line = fh.readline()
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/codecs.py”, line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte

An unexpected error has occurred:

‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte

See above for debug info.

Hey @Patilism,

Based on that, I think you need to also set LC_ALL:

export LC_ALL=en_US.utf8

Does that fix the issue?

If so you can add it to your $HOME/.bashrc file so you don’t have to set it every time.

Hi Evan,
Thanks for your response. But it didn’t fix the issue. I got following error…

Traceback (most recent call last):
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/bin/qiime”, line 11, in
sys.exit(qiime())
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/click/core.py”, line 676, in main
_verify_python3_env()
File “/Users/rohanpatil/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/click/_unicodefun.py”, line 118, in _verify_python3_env
‘for mitigation steps.’ + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult http://click.pocoo.org/python3/for mitigation steps.

This system lists a couple of UTF-8 supporting locales that
you can pick from. The following suitable locales where
discovered: af_ZA.UTF-8, am_ET.UTF-8, be_BY.UTF-8, bg_BG.UTF-8, ca_ES.UTF-8, cs_CZ.UTF-8, da_DK.UTF-8, de_AT.UTF-8, de_CH.UTF-8, de_DE.UTF-8, el_GR.UTF-8, en_AU.UTF-8, en_CA.UTF-8, en_GB.UTF-8, en_IE.UTF-8, en_NZ.UTF-8, en_US.UTF-8, es_ES.UTF-8, et_EE.UTF-8, eu_ES.UTF-8, fi_FI.UTF-8, fr_BE.UTF-8, fr_CA.UTF-8, fr_CH.UTF-8, fr_FR.UTF-8, he_IL.UTF-8, hr_HR.UTF-8, hu_HU.UTF-8, hy_AM.UTF-8, is_IS.UTF-8, it_CH.UTF-8, it_IT.UTF-8, ja_JP.UTF-8, kk_KZ.UTF-8, ko_KR.UTF-8, lt_LT.UTF-8, nl_BE.UTF-8, nl_NL.UTF-8, no_NO.UTF-8, pl_PL.UTF-8, pt_BR.UTF-8, pt_PT.UTF-8, ro_RO.UTF-8, ru_RU.UTF-8, sk_SK.UTF-8, sl_SI.UTF-8, sr_YU.UTF-8, sv_SE.UTF-8, tr_TR.UTF-8, uk_UA.UTF-8, zh_CN.UTF-8, zh_HK.UTF-8, zh_TW.UTF-8

Click discovered that you exported a UTF-8 locale
but the locale system could not pick up from it because
it does not exist. The exported locale is “en_US.utf8” but it
is not supported

Hey @Patilism,

My bad, I think your system spells it: en_US.UTF-8 so if you try that instead for LC_ALL it should work.

Hi Evan,
Thanks again for your prompt reply. I did LC_ALL=en_US.UTF-8 and it resolved above error, but didn’t fix the previous/very first error. I’m getting the same error message.
I also cross-checked that my demultiplexed fastq file is in unix / utf-8 format. Please feel free to have a look at the test file (https://exchangelabsgmu-my.sharepoint.com/:u:/g/personal/rpatil8_masonlive_gmu_edu/EZ_JVw07_NdFj9_yED7oMXcBwYEy0murf-TxhzF-O7GCPw?e=wXvSPy), and let me know.
Thanks and Regards,
Rohan

1 Like

Thanks for the file @Patilism,

Looking back at your original post I was too distracted by the decode error to notice that your file is the wrong kind for that --source-format.

I am able to reproduce your error, and I can confirm that it is a confusing one in context. What should have happened is the error would explain that the file was compressed and the wrong type. Instead it tried to read the gzip file as if it was plaintext causing the codec error. I created an issue to improve that.

To fix your problem you just need a different kind of format. To help you with that, how many samples do you have? Is it just the one, or are there more?

Thanks again Evan.
Please let me know what different kind of format is needed. For now, I’m having just this one sample for testing purpose, but later I’ll be having multiple samples with much bigger datasets to deal with.

Hey @Patilism,

Since you are using SingleEndFastqManifestPhred33 as your source-format, let’s go with that and create a manifest that contains your ATCC/PGM052_QIIME2.fastq.gz file. Since you do have multiple samples, I would recommend starting with at least 2 samples. While 1 should work in principle, I’m not 100% certain as some visualizations are trying to show a distribution (or summarize one), and a distribution of 1 doesn’t always work the way we expect it to.

In any case, you want to create a file as described in this section. Which means you’ll have something that looks roughly like:

sample-id,absolute-filepath,direction
PGM052,/some/filepath/ATCC/PGM052_QIIME2.fastq.gz,forward

You can create this file in Excel as it’s really just a spreadsheet, but you need to export as a CSV for QIIME 2 to understand it.

It is this file that you want to provide to --input-path instead of your fastq file as it lets QIIME 2 know which samples exist, and where to find the reads for them (and what direction the reads are).

Hope that makes sense!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.