Feature-table gives: 'ascii' codec can't decode byte 0xc3 in position 4 Error

Hi qiime team,

I have this strange error where a table that used to work years ago no longer works and the only explanation I can think of (my guess anyway) is there are special characters that are in the sampleIDs.

Anyway when trying to run a simple table summarize command

qiime feature-table summarize 
--i-table merged_table_Jun2_GMTOL.qza 
--o-visualization merged_table_Jun2_GMTOL_summary.qzv
Plugin error from feature-table:

'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Debug info has been saved to /var/folders/l5/bc59br8961516ssfthb0x16h0000gn/T/qiime2-q2cli-err-lft3_8zr.log

I get that. And the readout of that log file is:

cat /var/folders/l5/bc59br8961516ssfthb0x16h0000gn/T/qiime2-q2cli-err-lft3_8zr.log
Traceback (most recent call last):
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 478, in call
results = self._execute_action(
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 539, in _execute_action
results = action(**arguments)
File "", line 2, in summarize
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 339, in bound_callable
self.signature.transform_and_add_callable_args_to_prov(
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 390, in transform_and_add_callable_args_to_prov
self._transform_and_add_input_to_prov(
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 423, in _transform_and_add_input_to_prov
transformed_input = _input._view(spec.view_type, recorder)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/result.py", line 401, in _view
result = transformation(self._archiver.data_dir)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/transform.py", line 214, in wrapped
return transformer(view.file.view(self._wrapped_view_type))
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_types/feature_table/_transformer.py", line 108, in _5
return _parse_biom_table_v210(ff)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_types/feature_table/_transformer.py", line 52, in _parse_biom_table_v210
table = biom.Table.from_hdf5(fh)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/biom/table.py", line 4149, in from_hdf5
samp_ids, samp_md, samp_grp_md = axis_load(h5grp['sample'])
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/biom/table.py", line 4123, in axis_load
ids = np.asarray(ids, dtype=ids_dtype)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

The one problem is that I cannot even export this table out as a tsv file. I tried in qiime2 2023.7 and 2025.7 with no avail. Exporting to biom works but then exporting to tsv gives the “is not a biom file!” error.

Is there any other kind of workaround to edit a qza or biom file to get rid of the special characters? Or am I wrong and the problem is something else?

Any info would be appreciated!

Biom file is 21MB so leaving a gdrive link if this is needed: https://drive.google.com/file/d/114DgBBHYLLCKJ0hLGXr6zLMAfkpoZwMi/view?usp=sharing

Would you be willing to post this file as well?

I looked at the biom file you shared and saw this:

 n@V��TREE����������������GCOL
 No Table IDhttp://biom-format.orgqiime2 2023.2.02023-06-04T13:48:50.419028

There should be more into in the original .qza file!

Got it yes! Here you go. Let me know if you need anything else. Ty

Ah okay. So the .biom file was just the one from inside the .qza file.

Yeah, that's not working for me either. Same errors:

(qiime2-amplicon-2025.10) cbrisl@CB-MacBook-Air data % biom head -i feature-table.biom 
Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 668, in load_table
    table = parse_biom_table(fp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 422, in parse_biom_table
    t = Table.from_json(json.loads(file_obj),
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not File

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/bin/biom", line 11, in <module>
    sys.exit(cli())
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/cli/table_head.py", line 49, in head
    table = load_table(input_fp).head(n=n_obs, m=n_samp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 670, in load_table
    raise TypeError("%s does not appear to be a BIOM file!" % f)
TypeError: feature-table.biom does not appear to be a BIOM file!
(qiime2-amplicon-2025.10) cbrisl@CB-MacBook-Air data % 
(qiime2-amplicon-2025.10) cbrisl@CB-MacBook-Air data % biom table-ids -i feature-table.biom
Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 668, in load_table
    table = parse_biom_table(fp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 422, in parse_biom_table
    t = Table.from_json(json.loads(file_obj),
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not File

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/bin/biom", line 11, in <module>
    sys.exit(cli())
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/cli/table_ids.py", line 36, in summarize_table
    tab = load_table(input_fp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 670, in load_table
    raise TypeError("%s does not appear to be a BIOM file!" % f)
TypeError: feature-table.biom does not appear to be a BIOM file!

I'm totally out of ideas. I'll let the devs take it from here!

1 Like

Hi @Sam_Degregori,
I can also confirm the error after downloading your .qza and trying to summarize it. I'm using 2025.10.

You mentioned that this worked in the past - can you describe what you did with it then? Is there any chance that when you accessed the file recently (just before you ran into the error you're posting here) that a download failed, for example? It's a long shot, but I'm just wondering if there is a source file somewhere that might still work ok.

I suspect you're right about a sample id containing an invalid character. I tried running the following to see if I could get a .tsv from it using pandas, but it still fails when trying to parse the biom table, and the failure on the line annotated as 4216 suggests trouble when parsing sample ids.

In [1]: from qiime2 import Artifact

In [2]: import pandas as pd

In [3]: table = Artifact.load('./merged_table_Jun2_GMTOL.qza')

In [4]: table.view(pd.DataFrame).to_csv('./merged_table_Jun2_GMTOL.tsv', sep='\t')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 table.view(pd.DataFrame).to_csv('./merged_table_Jun2_GMTOL.tsv', sep='\t')

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/sdk/result.py:715, in Artifact.view(self, view_type)
    714 def view(self, view_type):
--> 715     return self._view(view_type)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/sdk/result.py:747, in Artifact._view(self, view_type, recorder)
    744     to_type = transform.ModelType.from_view_type(view_type)
    745     transformation = from_type.make_transformation(to_type,
    746                                                    recorder=recorder)
--> 747 result = transformation(self._archiver.data_dir)
    749 if view_type is qiime2.Metadata:
    750     result._add_artifacts([self])

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/core/transform.py:70, in ModelType.make_transformation.<locals>.transformation(view, validate_level)
     67 view = self.coerce_view(view)
     68 self.validate(view, level=validate_level)
---> 70 new_view = transformer(view)
     72 new_view = other.coerce_view(new_view)
     73 other.validate(new_view)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/core/transform.py:214, in SingleFileDirectoryFormatType._wrap_input.<locals>.wrapped(view)
    213 def wrapped(view):
--> 214     return transformer(view.file.view(self._wrapped_view_type))

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/q2_types/feature_table/_deferred_setup/_transformers.py:102, in _4(ff)
    100 @plugin.register_transformer
    101 def _4(ff: BIOMV210Format) -> pd.DataFrame:
--> 102     table = _parse_biom_table_v210(ff)
    103     return _table_to_dataframe(table)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/q2_types/feature_table/_deferred_setup/_transformers.py:52, in _parse_biom_table_v210(ff)
     50 def _parse_biom_table_v210(ff):
     51     with ff.open() as fh:
---> 52         table = biom.Table.from_hdf5(fh)
     53         return _drop_axis_metadata(table)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/biom/table.py:4216, in Table.from_hdf5(cls, h5grp, ids, axis, parse_fs, subset_with_metadata)
   4213     return ids, md, grp_md
   4215 obs_ids, obs_md, obs_grp_md = axis_load(h5grp['observation'])
-> 4216 samp_ids, samp_md, samp_grp_md = axis_load(h5grp['sample'])
   4218 # load the data
   4219 data_grp = h5grp[axis]['matrix']

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/biom/table.py:4189, in Table.from_hdf5.<locals>.axis_load(grp)
   4187 if ids.size > 0:
   4188     ids_dtype = 'U%d' % max([len(v) for v in ids])
-> 4189     ids = np.asarray(ids, dtype=ids_dtype)
   4191 parser = defaultdict(lambda: general_parser)
   4192 parser['taxonomy'] = vlen_list_of_str_parser

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Do you still have all of the source tables that you merged to create this one? One thing you could try would be to run the summarize command on each of those and see where it fails - if it's a single bad sample id, you may be able to narrow down where it's coming from that way.

1 Like

Just throwing in some extra info that I worked out:

I used https://myhdf5.hdfgroup.org/ to view the biom table and opened the sample IDs, then exported those to CSV and ran:

In [1]: with open('Downloads/data.csv', 'rb') as fh:
   ...:     for line in fh:
   ...:         try:
   ...:             line.decode('ascii')
   ...:         except:
   ...:             print(line.decode('utf8'), end='')

Yielding these characters:

GonzĂĄlez-Serrano4
GonzĂĄlez-Serrano6
GonzĂĄlez-Serrano8
GonzĂĄlez-Serrano16
GonzĂĄlez-Serrano2
GonzĂĄlez-Serrano5
GonzĂĄlez-Serrano13
GonzĂĄlez-Serrano3
GonzĂĄlez-Serrano18
GonzĂĄlez-Serrano9
GonzĂĄlez-Serrano11
GonzĂĄlez-Serrano20
GonzĂĄlez-Serrano22
GonzĂĄlez-Serrano1
GonzĂĄlez-Serrano12
GonzĂĄlez-Serrano17
GonzĂĄlez-Serrano7
GonzĂĄlez-Serrano14
GonzĂĄlez-Serrano19
GonzĂĄlez-Serrano10
GonzĂĄlez-Serrano15
GonzĂĄlez-Serrano21

This is the same byte sequence that np.asarray complained about.

I think it is odd that np.asarray(ids, dtype=ids_dtype) is no longer working. The dtype is set to U<max-length> which should be a Unicode string in numpy. Based on a little bit of testing, it seems as though a byte-string b'foo' is presumed ASCII whereas a unicode string works.

So the following fails:

In [14]: with open('Downloads/data.csv', 'rb') as fh:
    ...:     for line in fh:
    ...:         np.asarray(line, dtype=f'U{len(line)}')
    ...:
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[14], line 3
      1 with open('Downloads/data.csv', 'rb') as fh:
      2     for line in fh:
----> 3         np.asarray(line, dtype=f'U{len(line)}')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

whereas decoding the string first:

In [15]: with open('Downloads/data.csv', 'rb') as fh:
    ...:     for line in fh:
    ...:         np.asarray(line.decode('utf-8'), dtype=f'U{len(line)}')

does not fail.

I’m not sure when this changed or if it did, but since the summary used to work, it sounds like a drift in numpy’s string semantics (which I suppose is possible as a byte-sequence doesn’t actually describe a code-point array, and there’s nothing to describe the encoding to numpy otherwise).

1 Like

I think it is actually a combination of H5Py changes and BIOM:

In the 3.0 series, the following changed:

  • UTF-8 Variable -> numpy 'O' of bytes (tagged with UTF-8 encoding via dtype)

And this change in BIOM, would have meant that the byte-sequence is no longer decoded:

Which combined with the behavior of np.asarray means that non-ASCII UTF-8 sequences fail.


In [1]: from h5py import File

In [2]: import numpy as np

In [3]: fh = File('Downloads/feature-table.biom')

In [4]: np.asarray(fh.get('sample/ids')[:], dtype='U20')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 np.asarray(fh.get('sample/ids')[:], dtype='U20')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

using .asstr() as described in the later-versions of the docs (which looked to be slightly stale for 3.11.0):

In [5]: np.asarray(fh.get('sample/ids').asstr()[:], dtype='U20')
Out[5]:
array(['li8', 'li12', 'li4', ..., 'yanez-montalvo19', 'yanez-montalvo9',
       'yanez-montalvo1'], dtype='<U20')
1 Like

Hi all,

Thanks for the in depth feedback. @gregcaporaso I do have source files (about 200 studies) so the goal is to not have to start checking them all BUT it honestly shouldn’t be that terrible if I do. Can likely do a string of merges until I narrow it down.

@ebolyen interesting. Does this mean there is a workaround or am I best remerging my studies and then re-replacing that Gonzales study with new sampleIDs.

Thanks all

Sam

We’re a little stuck ourselves on what to do next, so I would probably recommend swapping the sample ID’s and re-merging. I’m sorry as I know that’s pretty tedious to do.

If you didn’t care about provenance, you could also swap the IDs in the biom file by using HDF5 directly via an editor (no real suggestions on one), or programmatically via h5py or similar and then saving it again. But that could go wrong in subtle ways, so if you aren’t jumping at the idea, I probably wouldn’t recommend it.

@Sam_Degregori, you should experience the failure on loading viewing a problematic artifact, so I don't think you should have to do a series of merges. The following should let you know which is/are problematic:

import glob
import qiime2
import biom

fps = glob.glob('./*.qza')

for fp in fps:
    try:
        qiime2.Artifact.load(fp).view(biom.Table)
        print(f'Successfully loaded artifact: {fp}')
    except UnicodeDecodeError as e:
        print(f'!! Failed to load artifact: {fp}')

Here's some example output:

!! Failed to load artifact: ./merged_table_Jun2_GMTOL.qza
Successfully loaded artifact: ./asv-table.qza

Sorry for the trouble here! As @ebolyen mentioned, we're working on plan for dealing with this.

Great script thanks for sending. Definitely will save me some time. thank you!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.