UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x8b in position 513

ariel · August 22, 2018, 7:26am

Hi so excitingly, my manifest file now works, however I'm getting this error upon import:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path manifest.txt --output-path ../paired-end-demux.qza --source-format PairedEndFastqManifestPhred33
Traceback (most recent call last):
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
view_type=source_format)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 219, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 244, in _from_view
result = transformation(view)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/core/transform.py", line 73, in transformation
other.validate(new_view)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/core/transform.py", line 143, in validate
view.validate('min')
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 171, in validate
getattr(self, field)._validate_members(collected_paths, level)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 101, in _validate_members
self.format(path, mode='r').validate(level)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py", line 24, in validate
self.validate(level)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py", line 159, in validate
self._check_n_records(record_count_map[level])
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py", line 119, in check_n_records
for i, record in file:
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 513: ordinal not in range(128)

An unexpected error has occurred:

'ascii' codec can't decode byte 0x8b in position 513: ordinal not in range(128)

See above for debug info.

after reading some other help topics, I tried this:
echo $LC_ALL
en_US.utf8
echo $LANG

export LANG=en_US.utf8

and got the same error.

Nicholas_Bokulich · August 22, 2018, 2:23pm

Sounds like your raw data might be corrupted — check out this thread and this post and let us know what you see!

ariel · August 22, 2018, 5:01pm

Hi Nicholas,

Thank you so much. I'm trying to simultaneously work on re downloading the files.

The results from what I got when trying what was in the thread are above. I found that LANG was empty so tried export LANG=en_US.utf8, and then tried importing again and got the same error.

Here is what I from the second link's suggestions :

$ file H_R1_001.fastq.gz

H_R1_001.fastq.gz: gzip compressed data, max speed

$ gunzip H_R1_001.fastq.gz

$ file H_R1_001.fastq

H_R1_001.fastq: ASCII text, with very long lines

notably these are thousands of files downloaded from a couple different sources so it's possible that only some of them have a problem.

ariel · August 22, 2018, 11:37pm

@Nicholas_Bokulich As an update, I went and re-downloaded the files and sorted out ones that might be problematic, and was extremely thorough (the amazing person that maintains the Stanford cluster I'm using helped me to do the best job possible)...and I still get the same error.

This time when I ran

echo $LC_ALL

echo $LANG
en_US.UTF-8

LC_ALL was empty, so I set it using: export LC_ALL=en_US.utf8 and then tried again and I still got the same error.

Do you have any suggestions?

Update 2--
for f in untarred/*.gz; do grep $'x8b' $f; done
several matched, so I'm going to take them out and try again.

Update 3--
got the same error again. I'm going to search again for the bad character in the unzipped files and will update you.

ariel · August 23, 2018, 7:22am

It worked!

What worked was searching all the untarred, unzipped files for the character and also doing file on each file to find additional problems

In case this helps someone later, I did that like this:
for f in folder-with-files/*;
do file $f;
done

Thank you so so much for your help

system · September 23, 2018, 8:32pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.