Metadata error: utf-8 codec can't decode

I have finished running DADA2 and am now in the process of trying to summarize and visualize my feature table. To do this, I have to input my metadata (QIIME1's mapping) file, however when I do so, I am getting an error saying that my metadata file must be encoded as UTF-8 or ASCII.

I checked the file type using code below:
(qiime2) u12923-2k3:2018_SkinSampling_rerun u12923$ file -I mapping_skinABiL.tsv
mapping_skinABiL.tsv: text/plain; charset=iso-8859-1

as you see, the character type is not correct. My computer is a MAC so when I create this metadata file, I start in excel and save the final product as as .tsv file. I have attached the metadata file. I did validate it in keemei and I get warnings because my sample ids have "_" instead of "-", however I cannot change this because when I imported my sequences, they included the sample ids (ex: "G139_2_R1.fastq") as part of the file name. Can you help me figure out why I continue to get this error. I've even tried rerunning with sample ids changed (ex: G139-2) in the metadata file, but I get the same error. I then saved the metadata file as a UTF-8 csv file and changed the extension to .tsv and used that as the metadata input in the following code and get the following error:

(qiime2) u12923-2k3:2018_SkinSampling_rerun u12923$ qiime feature-table summarize --i-table first_analysis/02_DADA2/dada2_table.qza --o-visualization first_analysis/02_DADA2/dada2_table.qzv --m-sample-metadata-file mapping_skinABiL_utf.tsv

There was an issue with loading the file mapping_skinABiL_utf.tsv as metadata:

Found unrecognized ID column name '\ufeff#SampleID,BarcodeSequence,LinkerPrimerSequence,BarcodeName,ProjectName,Sample_ID,Subject_ID,age_years,Sampling_method,organism,organism_part,race,sex,ethnic_group,material_entity,assay_type,library_source,library_selection,library_strategy,library_layout,assay_platform,Vendor_name,DNA_extraction_date,Library_preparation_date,Sequence_date,Description' while searching for header. The first column name in the header defines the ID column, and must be one of these values:

Case-insensitive: 'feature id', 'feature-id', 'featureid', 'id', 'sample id', 'sample-id', 'sampleid'

Case-sensitive: '#OTU ID', '#OTUID', '#Sample ID', '#SampleID', 'sample_name'

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Please help me identify why I am continuously getting an error for my metadata file.

<a class="attachment" mapping_skinABiL.tsv (10.4 KB)
mapping_skinABiL_utf.tsv (10.4 KB)

Hey @kosnicki! Well, this little bit here is a CSV, not a TSV (CSV stands for comma-separated values, as opposed to TSV, which is tab-separated). Also, it looks like there is some extra characters at the beginning of the file (\ufeff), but that is secondary, and probably related to the encoding issue.

Looking at the actual file you attached, that appears to be a TSV file, and like you mentioned, it is encoded as iso-8859-1. QIIME 2 needs ASCII or UTF-8 formatted files in order to work as expected - if you save that files as UTF-8 it works as expected. There are many ways to do that, the best would be to make sure that your editor or Excel is saving as UTF-8. What I ran was the following:

iconv -t UTF-8 -f ISO-8859-1 mapping_skinABiL.tsv > out.txt
file -I out.txt
   out.txt: text/plain; charset=utf-8
qiime metadata tabulate --m-input-file out.txt --o-visualization mapping_skinABiL.qzv

Here are those files, for your records!

out.txt (10.4 KB)
mapping_skinABiL.qzv (1.1 MB)

Hope that helps! :t_rex:

PS - it looks like you have a few columns of metadata of entirely empty/null values --- no need to include those in QIIME 2, since they will just be ignored, anyway.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.