'ascii' error code when importing

Hello,

I have looked at other post mentioning the error "ascii" codec that I am receiving while trying to import my data from a manifest file. Most of the solutions were due to out of data versions or corrupt files. I have the latest version of QIIME2 (2024.2.0) and when I unzip and look at the text files, they look correct. I am just trying to get this to work for a subset of two samples before running larger data sets. below is the error message I receive when attempting to import:

(qiime2-amplicon-2024.2) CT18127:qiime2-sp23E-test aabrams$ qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path Manifest.tsv --output-path paired-end-demux.qza --input-format PairedEndFastqManifestPhred33V2

Traceback (most recent call last):
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 852, in _import
artifact = qiime2.sdk.Artifact.import_data(
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 332, in import_data
return cls.from_view(type, view, view_type, provenance_capture,
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 360, in _from_view
result = transformation(view, validate_level)
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 73, in transformation
other.validate(new_view)
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 143, in validate
view.validate(level)
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/plugin/model/directory_format.py", line 177, in validate
getattr(self, field)._validate_members(collected_paths, level)
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/plugin/model/directory_format.py", line 107, in _validate_members
self.format(path, mode='r').validate(level)
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/plugin/model/file_format.py", line 26, in validate
self.validate(level)
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_format.py", line 289, in validate
self._check_n_records(record_count_map[level])
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_format.py", line 249, in check_n_records
for i, record in file
:
File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 11: ordinal not in range(128)

An unexpected error has occurred:

'ascii' codec can't decode byte 0x80 in position 11: ordinal not in range(128)

See above for debug info.

I am not really sure what to do, from what I have see online, it is not an issue with the manifest file itself but with the sequencing data files. I tried to re-download them, all are zipped adn in .qz format:

Any suggestions would be appreciated!
Thank you!

Hello @aabrams,

Yes this is almost certainly an issue in the sequence files themselves. The error is occurring here:

File "/Users/aabrams/miniconda3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2_types/per_sample_sequences/_format.py", line 289, in validate self._check_n_records(record_count_map[level])

When we try to validate your sequences. Can you please DM your seqeunces so I can attempt to recreate the issue?

Thank you.

@aabrams Looking at the error message again, it specifies that the erroneous byte is 0x80 in hexadecimal (base 16) which is 128 in base 10. If you look here that byte in "Extended ASCII Codes" is Ç.

Does that character appear in your sequence files anywhere? More generically, do you know of any reason that any character outside standard English alpha-numeric characters and punctuation would appear in your sequences?

Thank you for the response and link for the error information. I opened all the files in text editor and searched for that symbol but none were present. Not sure if this is helpful since it is only part of the file, but here is what the first few lines look like:

@VH01105:64:AACTYWLM5:1:1101:26904:1000 1:N:0:CGATGGATAT+AGGCGTGTTC
CCTACGGGGGGCAGCAGTGAGGAATATTGGTCAATGGACGTAAGTCTGAACCAGCCAAGTAGCGTGCAGGATGACGGCCCTATGGGTTGTAAACTGCTTTTATGCGGGGATAAAGTCGGCTACGCGTAGCCGTTTGTAGGTACCGCATGAATAAGGACCGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGAAGGTCCGGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGCAGGCCGGAGGTCAAGCGTGACGTGAAATGTAGCCGCTCAACGGCTGAGTTGCGTCGCGAACTGG
+
5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC*CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCCCCC*CCCCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC*CCCC5CC5CCCC
@VH01105:64:AACTYWLM5:1:1101:38341:1000 1:N:0:CGATGGATAT+AGGCGTGTTC
CCTATGGGGTGCTGCAGTGAGGAATATTGGTCAATGGGCGTAAGCCTGAACCAGCCAAGTAGCGTGCAGGATGACGGCCCTATGGGTTGTAAACTGCTTTTATGCGGGGATAAAGTCACCTACGTGTAGGTGTTTGTAGGTACCGCATGAATAAGGACCGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGAAGGTCCGGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGCAGGCGGAAGATTAAGCGTGACGTGAAATGTACCGGCTCAACCGGTGACGTGCGTCGCGAACTGG
+
*5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCC
@VH01105:64:AACTYWLM5:1:1101:51217:1000 1:N:0:CGATGGATAT+AGGCGTGTTC
NGTACGGGTGGCTGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGCAGGAAGACGGCCCTATGGGTTGTAAACTGCTTTTATCAGGGGATAAAGTGCGCCACGTGTGGTGTTTTGTAGGTACCTGATGAATAAGGACCGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGAAGGTCCGGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGCAGGCCGATGGTTAAGCGTGACGTGAAATGTAGGGGCTCAACCTTTGAATTGCGTCGCGAACTGG
+
#5CCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC5CCCCCCCCCCCCCCCCCCCCCCCC*C
@VH01105:64:AACTYWLM5:1:1101:38284:1019 1:N:0:CGATGGATAT+AGGCGTGTTC
CCTACGGGTGGCTGCAGTAGGGAATATTGCACAATGGGGGGAACCCTGATGCAGCAACGCCACGTGTGGGAAGAAGCATTTCGGTGTGTAAACCACTGTCATGAGGGAATAAGGCCCGTTTTCGGACGGGATTGAATGTACCTTGAGAGGAAGCACCGGCAAACTTCGTGCCAGCAGCCGCGGTAATACGAGGGGTGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGGGAGCGTAGGCGGAGATTCAAGCGGATTGTACAATCCCGGGGCCCAACCCCGGCTCTGCAGTCCGAACTGGAT
+

@aabrams Thank you for DMing me your data, I see what's going on. I downloaded your data and unzipped it.



I then opened that first fastq.gz and got this



Your .fastq.gz files each contain both that __MACOSX file and another nested .fastq.gz. You want to take out those nested .fastq.gz files and import those. Those nested ones only contain a single .fastq as they are supposed to.



I get the same error as you trying to import your data, but it works if I un-nest those .fastq.gzs. Does that make sense?

1 Like

I just realized I never replied to this. Yes, this was a huge help and I greatly appreciate you figuring out what the issue was with my files! That solved the issue, THANK YOU!

2 Likes