Problem importing demultiplexed data: 'utf-8' codec can't decode

Hello!

I’ve come across a problem when trying to import demultiplexed fastq via the manifest.csv format using the following code:

qiime tools import --type SampleData[PairedEndSequencesWithQuality] --input-path /Users/administrator/Desktop/skin\ microbiome/manifest.csv --output-path demux_import.qza --input-format PairedEndFastqManifestPhred33

Running this command, I receive this error:

" An unexpected error has occurred:

’utf-8’ codec can’t decode byte 0xca in position 227: invalid continuation byte"

I already searched this forum and came across some similar threads, where people manually looked for non-ASCII characters in their fastq-files… But I don’t know where to look for, it’s like searching a needle in a haystick. And what does “position 227” mean? The 227th character?

Thank you very much in advance!

Hey there @e_flat_minor!

Please see this post for more details: Better location information for "utf-8 can't decode codec..." error

I would try saving the file as UTF-8 encoded, you can try to see if Excel or some other text editor will assist with that.

Hello @thermokarst,

thank you for your reply. I have already looked through all the threads and none of those threads is really resolved, except for the one, where @nick-youngblut found the “ƒ”.
I have opened my fastq-files as .txt files and checked in Excel in the given position (227 in my case). In that cell I can only see a “+”. Even deleting this read completely from the fastq-file gives me the same error ( ’utf-8’ codec can’t decode byte 0xca in position 227: invalid continuation byte ") - even though I should have deleted that line.
I also tried re-converting the files to UTF-8 with the Os text-editor, still the same error.

Thanks for your help!

Hmm, 227 refers to the byte position, not the Excel cell.

Can you please send me a download link to the file in a direct message?

I found the culprit: 0xca probably stands for the following non-UTF-8 character:

Dezimal: 9577, UTF-8: 0xE2 0x95 0xA9 = 226 149 169

(source: http://www.gymel.com/charsets/CP850.html).

But I cannot seem to find this character in my fastq-file, whether I search for it in word, excel oder text-edit. (Nor can I find it the decimal code 9577 when I look through the fastq via od -c).

I don’t think it is in your FASTQ file — the error is most likely stemming from your manifest.

Yes, it was my manifest file, which was corrupted. Somehow an “É” had slipped into it - although I could only see an empty space in my Excel .csv file - after deleting that empty space, everything was fine. (the file command helped me see that my text was not ASCII, so I deleted that space and used iconv to convert it into UTF-8 and everything worked fine. I’m just writing this down in case someone else is experiencing the same problem)

Thank you very much for your support @thermokarst!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.