I am having a problem with my metadata. I ran Keemei and it appeared: 46 invalid cells, 0 errors, 46 warnings. It seems to me that these invalid cells and the warnings refer to the names of the samples that have underscore. I am using qiime2-2019.1 version, is this a problem? Any tips?
Here is the ERROR:
There was an issue with loading the file mapping_feb2020.tsv as metadata:
Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:
‘utf-8’ codec can’t decode byte 0xe7 in position 4033: invalid continuation byte
There may be more errors present in the metadata file. To get a full report, sample / feature metadata files can be validated with Keemei: https://keemei.qiime2.org
Thanks for helping me!
Attached is my metadata (with underscore). If I take this"_", I will have problems with the names of the samples that have already been sequenced, right?
I’m not sure why this problem is happening.
I have used “_” in the past and have had no problems. So I’m confused.
Take a look at this line. POW0449 TCTGTGTCTAAT GTGTGYCAGCMGCCGCGGTAA 16 Cephalotes Sample pallidoides pallidoides Brazil Piaui -4.128868 -41.687887 na noforthisquestion cerradotipico-carrasco-transi?aocaatinga 1 gastermobio G2 POW0449
That’s on line 164, and contains this non utf-8 character:
transiçaocaatinga
^ not utf-8 :-(
Once you replace that, this file should work fine!
Thanks @colinbrislawn, I think more specifically, the file is not encoded as UTF-8:
file Cephalotes_mapping_feb2020.tsv
Cephalotes_mapping_feb2020.tsv : ISO-8859 text, with CRLF line terminators
See above, the file is encoded as ISO-8859.
Once the file is converted to UTF-8, it works fine:
# this utility was installed on my computer,
# but there are many others like it
iconv -f ISO-8859-1 -t UTF-8 < Cephalotes_mapping_feb2020.tsv > Cephalotes_mapping_feb2020_utf8.tsv
qiime metadata tabulate --m-input-file Cephalotes_mapping_feb2020_utf8.tsv --o-visualization check.qzv
Here is the line @colinbrislawn was worried about, no need to replace: