metadata problem utf-8 codec can't decode byte invalid start byte

I have done clean up based on my review of keemei and there were some issues with duplicated sampleIDs. Those look to be resolved and I have the metadata saved as a UTF-8.

I am receiving an error message that Q2 is having an issue loading the file.
utf-8 codec can't decode byte 0Xa0 in position 6662: invalid start byte MetadataMicrositesandToxicology.txt (8.7 KB) MetadataMicrositesandToxicology.txt (8.7 KB)

Any guidance would be appreciated. (qiime2-2020.8) Kyles-MBP:~ kyleharris$ qiime demux emp-paired \ --m-barcodes-file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt \ --m-barcodes-column BarcodeSequence \ --i-seqs /Users/kyleharris/Desktop/CFMB2020/emp-paired-end-sequences.qza \ --o-per-sample-sequences demux.qza \ --o-error-correction-details demux-details.qza

There was an issue with loading the file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt as metadata:

Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

'utf-8' codec can't decode byte 0xa0 in position 6662: invalid start byte

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: Metadata in QIIME 2 — QIIME 2 2020.8.0 documentation

Looks like I might have a duplicated barcodeID in my metadatafile: 515rcbc147

I corrected the one sampleID that was duplicated on the metadata file and I am still receiving the same error message:
(qiime2-2020.8) Kyles-MBP:~ kyleharris$ qiime demux emp-paired \ --m-barcodes-file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt \ --m-barcodes-column BarcodeSequence \ --i-seqs /Users/kyleharris/Desktop/CFMB2020/emp-paired-end-sequences.qza \ --o-per-sample-sequences demux.qza \ --o-error-correction-details demux-details.qza

There was an issue with loading the file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt as metadata:

Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

’utf-8’ codec can’t decode byte 0xa0 in position 6662: invalid start byte

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: https://docs.qiime2.org/2020.8/tutorials/metadata/
[MetadataMicrositesandToxicology.txt|attachment] (upload://lcHlmM04lXe8PTDDS5fwJuMnW6u.txt) (8.7 KB) This is my updated metadata file. Any thoughts on why I am still getting the same error?

Hi @mudbugecology,

It appears that the error:

is being caused by special characters (i.e. †) in your data files. When I open your file n Excel I see:
excel_ss

Which looks like this in my Raw Text Editor BBEdit. I am also showing the invisible text like tabs (triangles) and spaces (dots):
bbedit_ss

I would highly recommend that you always use a raw text editor like NotePad, Atom, BBEdit, etc... to view the files prior to importing them. Tools like MS Word & Excel can sometimes hide these characters which will cause problems later.

-Cheers!
-Mike

Thanks, Mike! I will clean this up and try again. Kyle

1 Like

So I reviewed the file in NotePad and the updated metadata file seems to have cleared those noted areas with extra characters. I am not sure what this unexpected extra argument means (some of the other replies in the forum mention extra spaces. Is this picking up something else from the metadatafile?(qiime2-2020.8) Kyles-MBP:~ kyleharris$ qiime demux emp-paired \ --m-barcodes-file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt \ --m-barcodes-column BarcodeSequence \ --i-seqs /Users/kyleharris/Desktop/CFMB2020/emp-paired-end-sequences.qza \ --o-per-sample-sequences demux.qza \ --o-error-correction-details demux-details.qza
Usage: qiime demux emp-paired [OPTIONS]

Demultiplex paired-end sequence data (i.e., map barcode reads to sample
ids) for data generated with the Earth Microbiome Project (EMP) amplicon
sequencing protocol. Details about this protocol can be found at
Protocols and Standards : earthmicrobiome

Inputs:
--i-seqs ARTIFACT EMPPairedEndSequences
The paired-end sequences to be demultiplexed.
[required]
Parameters:
--m-barcodes-file METADATA
--m-barcodes-column COLUMN MetadataColumn[Categorical]
The sample metadata column containing the per-sample
barcodes. [required]
--p-golay-error-correction / --p-no-golay-error-correction
Perform 12nt Golay error correction on the barcode
reads. [default: True]
--p-rev-comp-barcodes / --p-no-rev-comp-barcodes
If provided, the barcode sequence reads will be
reverse complemented prior to demultiplexing.
[default: False]
--p-rev-comp-mapping-barcodes / --p-no-rev-comp-mapping-barcodes
If provided, the barcode sequences in the sample
metadata will be reverse complemented prior to
demultiplexing. [default: False]
Outputs:
--o-per-sample-sequences ARTIFACT
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences. [required]
--o-error-correction-details ARTIFACT ErrorCorrectionDetails
Detail about the barcode error corrections. [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr during
execution of this action. Or silence output if
execution is successful (silence is golden).
--examples Show usage examples and exit.
--citations Show citations and exit.
--help Show this message and exit.

                There was a problem with the command:                     

(1/1) Got unexpected extra arguments ( )MetadataMicrositesandToxicology.txt (8.7 KB)

You either need to put everything on one line, without the backslashes, or, multiple lines with the backslashes:

qiime demux emp-paired --m-barcodes-file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt --m-barcodes-column BarcodeSequence --i-seqs /Users/kyleharris/Desktop/CFMB2020/emp-paired-end-sequences.qza --o-per-sample-sequences demux.qza --o-error-correction-details demux-details.qza

or

qiime demux emp-paired \
  --m-barcodes-file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt \
  --m-barcodes-column BarcodeSequence \
  --i-seqs /Users/kyleharris/Desktop/CFMB2020/emp-paired-end-sequences.qza \
  --o-per-sample-sequences demux.qza \
  --o-error-correction-details demux-details.qza

The backslashes tell your shell "hey wait, don't run this command yet, I intend to supply more details to the command, on a new line" (otherwise pressing the enter/return key would tell the shell to evaluate whatever you had typed to that point).

To be clear, this is unrelated to the metadata issue you reported above.

Keep us posted! :qiime2:

Thanks, Matt! I will make this correction.

So that did fix the one error and would the following be related to the emp-paired-end-sequences.qza? (qiime2-2020.8) Kyles-MBP:~ kyleharris$ qiime demux emp-paired \

--m-barcodes-file /Users/kyleharris/Desktop/CFMB2020/MetadataMicrositesandToxicology.txt \

--m-barcodes-column BarcodeSequence \

--i-seqs /Users/kyleharris/Desktop/CFMB2020/emp-paired-end-sequences.qza \

--o-per-sample-sequences demux.qza \

--o-error-correction-details demux-details.qza

Plugin error from demux:

No sequences were mapped to samples. Check that your barcodes are in the correct orientation (see the rev_comp_barcodes and/or rev_comp_mapping_barcodes options). If barcodes are NOT Golay format set golay_error_correction to False.

Debug info has been saved to /var/folders/6w/6zjxtzz12y5dkn40sxjg9mbm0000gn/T/qiime2-q2cli-err-butgz6vy.log

Congrats on making it past that part, @mudbugecology! This new error is a third, and unrelated issue. Please open a new topic to discuss. Thanks!

Will do! Thank you for the assistance!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.