The problem about the fastq data uploading QIIME2

Doc.chen · October 11, 2019, 2:05pm

it is the first time we uploading our data to QIIME2 , but failed to upload and creat .qza so many times and can not find reason.
the error code is
An unexpected error has occurred:

'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

and we already created our “manifest” as picture

if you know how to fix it please help me! and thanks you very much
PS：my data is already demultiplexed and is pair-end data

ebolyen · October 11, 2019, 5:11pm

Hey @Doc.chen!

Welcome to the forum!

Would you be able to upload your manifest file to the forum as an attachment?

I strongly suspect that your file is UTF-16-le (instead of UTF-8) as 0xff is the start of a byte-order mark for UTF-16-le. When you saved the file, what editor did you use, and what option to save (screenshot would be great)?

Doc.chen · October 14, 2019, 7:27am

thanks for your help！i used excel to saved this file and saved as txt.

se-33-manifest.txt (836 Bytes)

ebolyen · October 14, 2019, 4:56pm

Thanks @Doc.chen!

Just as I suspected, you are saving as UTF-16 Unicode (Little-endian). In that dropdown, is there an equivalent txt format for UTF-8 Unicode (.txt)?

(I'm guessing a little bit as to what the specific name is)

If there is a UTF-8 option, try saving it using that and see if that fixes the issue.

Doc.chen · October 17, 2019, 7:27am

thanks you very much ！after the guide which you send me, i uploading my data successfully. But i found that the problem is not only about UTF-8，but also something else, such as the manifest table can not using $PWD if the computer system is MAC OS et al，if you follow the instructions on the QIIME2 official website exactly，errors must be there. Again, thanks you very much to help me!

PS:i have no ideal how other people works, but when i have so many data need to analyze，did that mean i need spend lots of times to write my “manifest”？such as in this research, i have 10 sample, and the error instruct always show after the data run, but if i have 200 sample, it will consume many time and hard to correct error.

ebolyen · October 17, 2019, 3:31pm

Hey @Doc.chen,

As always, the error message (and log) is helpful. The $PWD is kind of a shortcut and is calculated from your current position in your terminal (if you aren't sure what that is, that's ok, you can type pwd to find out!). This should work fine on OS X, but it's certainly finicky for other reasons.

I tend to use this format personally, because it doesn't take any effort, HOWEVER, you must have your files named a certain way (as described there). This is usually the default output of Illumina Sequencers, but every facility has it's own scheme, so this one may not be yours.

Doc.chen · October 18, 2019, 7:18am

yes! and thanks you very much!

system · November 18, 2019, 1:23pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.