Metadata problem Underscore?

Hi,

I am having a problem with my metadata. I ran Keemei and it appeared: 46 invalid cells, 0 errors, 46 warnings. It seems to me that these invalid cells and the warnings refer to the names of the samples that have underscore. I am using qiime2-2019.1 version, is this a problem? Any tips?

Here is the ERROR:
There was an issue with loading the file mapping_feb2020.tsv as metadata:

Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

‘utf-8’ codec can’t decode byte 0xe7 in position 4033: invalid continuation byte

There may be more errors present in the metadata file. To get a full report, sample / feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: https://docs.qiime2.org/2019.1/tutorials/metadata/

Thanks in advance!

Hello @Manuela_Ramalho

EDIT: I'm not sure if the _ underscores are problem, but some character(s) are causing issues. What characters do you have in your sample names?

We might have to deal with that too :thinking:

Colin

1 Like

Hi Colin!

Thanks for helping me!
Attached is my metadata (with underscore). If I take this"_", I will have problems with the names of the samples that have already been sequenced, right?

I’m not sure why this problem is happening.
I have used “_” in the past and have had no problems. So I’m confused.

Thanks again!

All the best,
Manu

Hello Manu,

I think I found the issue!

Take a look at this line.
POW0449 TCTGTGTCTAAT GTGTGYCAGCMGCCGCGGTAA 16 Cephalotes Sample pallidoides pallidoides Brazil Piaui -4.128868 -41.687887 na noforthisquestion cerradotipico-carrasco-transi?aocaatinga 1 gastermobio G2 POW0449

That’s on line 164, and contains this non utf-8 character:

transiçaocaatinga
      ^ not utf-8 :-(

Once you replace that, this file should work fine!

Colin

1 Like

Thanks @colinbrislawn, I think more specifically, the file is not encoded as UTF-8:

file Cephalotes_mapping_feb2020.tsv 
Cephalotes_mapping_feb2020.tsv : ISO-8859 text, with CRLF line terminators

See above, the file is encoded as ISO-8859.

Once the file is converted to UTF-8, it works fine:

# this utility was installed on my computer,
# but there are many others like it
iconv -f ISO-8859-1 -t UTF-8 < Cephalotes_mapping_feb2020.tsv > Cephalotes_mapping_feb2020_utf8.tsv

qiime metadata tabulate --m-input-file Cephalotes_mapping_feb2020_utf8.tsv --o-visualization check.qzv

Here is the line @colinbrislawn was worried about, no need to replace:

2 Likes

Thanks, @colinbrislawn and @thermokarst!!

It works!

2 Likes