Feature-table gives: 'ascii' codec can't decode byte 0xc3 in position 4 Error

Sam_Degregori · December 10, 2025, 11:40pm

Hi qiime team,

I have this strange error where a table that used to work years ago no longer works and the only explanation I can think of (my guess anyway) is there are special characters that are in the sampleIDs.

Anyway when trying to run a simple table summarize command

qiime feature-table summarize 
--i-table merged_table_Jun2_GMTOL.qza 
--o-visualization merged_table_Jun2_GMTOL_summary.qzv
Plugin error from feature-table:

'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Debug info has been saved to /var/folders/l5/bc59br8961516ssfthb0x16h0000gn/T/qiime2-q2cli-err-lft3_8zr.log

I get that. And the readout of that log file is:

cat /var/folders/l5/bc59br8961516ssfthb0x16h0000gn/T/qiime2-q2cli-err-lft3_8zr.log
Traceback (most recent call last):
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 478, in call
results = self._execute_action(
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/commands.py", line 539, in _execute_action
results = action(**arguments)
File "", line 2, in summarize
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/action.py", line 339, in bound_callable
self.signature.transform_and_add_callable_args_to_prov(
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 390, in transform_and_add_callable_args_to_prov
self._transform_and_add_input_to_prov(
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/type/signature.py", line 423, in _transform_and_add_input_to_prov
transformed_input = _input._view(spec.view_type, recorder)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/result.py", line 401, in _view
result = transformation(self._archiver.data_dir)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/transform.py", line 214, in wrapped
return transformer(view.file.view(self._wrapped_view_type))
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_types/feature_table/_transformer.py", line 108, in _5
return _parse_biom_table_v210(ff)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2_types/feature_table/_transformer.py", line 52, in _parse_biom_table_v210
table = biom.Table.from_hdf5(fh)
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/biom/table.py", line 4149, in from_hdf5
samp_ids, samp_md, samp_grp_md = axis_load(h5grp['sample'])
File "/Users/samde/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/biom/table.py", line 4123, in axis_load
ids = np.asarray(ids, dtype=ids_dtype)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

The one problem is that I cannot even export this table out as a tsv file. I tried in qiime2 2023.7 and 2025.7 with no avail. Exporting to biom works but then exporting to tsv gives the “is not a biom file!” error.

Is there any other kind of workaround to edit a qza or biom file to get rid of the special characters? Or am I wrong and the problem is something else?

Any info would be appreciated!

Biom file is 21MB so leaving a gdrive link if this is needed: https://drive.google.com/file/d/114DgBBHYLLCKJ0hLGXr6zLMAfkpoZwMi/view?usp=sharing

colinbrislawn · December 11, 2025, 4:53pm

Would you be willing to post this file as well?

I looked at the biom file you shared and saw this:

 n@V��TREE����������������GCOL
 No Table IDhttp://biom-format.orgqiime2 2023.2.02023-06-04T13:48:50.419028

There should be more into in the original .qza file!

Sam_Degregori · December 11, 2025, 9:53pm

Got it yes! Here you go. Let me know if you need anything else. Ty

colinbrislawn · December 11, 2025, 11:03pm

Ah okay. So the .biom file was just the one from inside the .qza file.

Yeah, that's not working for me either. Same errors:

(qiime2-amplicon-2025.10) cbrisl@CB-MacBook-Air data % biom head -i feature-table.biom 
Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 668, in load_table
    table = parse_biom_table(fp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 422, in parse_biom_table
    t = Table.from_json(json.loads(file_obj),
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not File

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/bin/biom", line 11, in <module>
    sys.exit(cli())
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/cli/table_head.py", line 49, in head
    table = load_table(input_fp).head(n=n_obs, m=n_samp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 670, in load_table
    raise TypeError("%s does not appear to be a BIOM file!" % f)
TypeError: feature-table.biom does not appear to be a BIOM file!
(qiime2-amplicon-2025.10) cbrisl@CB-MacBook-Air data % 
(qiime2-amplicon-2025.10) cbrisl@CB-MacBook-Air data % biom table-ids -i feature-table.biom
Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 668, in load_table
    table = parse_biom_table(fp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 422, in parse_biom_table
    t = Table.from_json(json.loads(file_obj),
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not File

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/bin/biom", line 11, in <module>
    sys.exit(cli())
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/cli/table_ids.py", line 36, in summarize_table
    tab = load_table(input_fp)
  File "/Users/cbrisl/miniforge3/envs/qiime2-amplicon-2025.10/lib/python3.10/site-packages/biom/parse.py", line 670, in load_table
    raise TypeError("%s does not appear to be a BIOM file!" % f)
TypeError: feature-table.biom does not appear to be a BIOM file!

I'm totally out of ideas. I'll let the devs take it from here!

gregcaporaso · December 15, 2025, 9:53pm

Hi @Sam_Degregori,
I can also confirm the error after downloading your .qza and trying to summarize it. I'm using 2025.10.

You mentioned that this worked in the past - can you describe what you did with it then? Is there any chance that when you accessed the file recently (just before you ran into the error you're posting here) that a download failed, for example? It's a long shot, but I'm just wondering if there is a source file somewhere that might still work ok.

I suspect you're right about a sample id containing an invalid character. I tried running the following to see if I could get a .tsv from it using pandas, but it still fails when trying to parse the biom table, and the failure on the line annotated as 4216 suggests trouble when parsing sample ids.

In [1]: from qiime2 import Artifact

In [2]: import pandas as pd

In [3]: table = Artifact.load('./merged_table_Jun2_GMTOL.qza')

In [4]: table.view(pd.DataFrame).to_csv('./merged_table_Jun2_GMTOL.tsv', sep='\t')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 table.view(pd.DataFrame).to_csv('./merged_table_Jun2_GMTOL.tsv', sep='\t')

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/sdk/result.py:715, in Artifact.view(self, view_type)
    714 def view(self, view_type):
--> 715     return self._view(view_type)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/sdk/result.py:747, in Artifact._view(self, view_type, recorder)
    744     to_type = transform.ModelType.from_view_type(view_type)
    745     transformation = from_type.make_transformation(to_type,
    746                                                    recorder=recorder)
--> 747 result = transformation(self._archiver.data_dir)
    749 if view_type is qiime2.Metadata:
    750     result._add_artifacts([self])

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/core/transform.py:70, in ModelType.make_transformation.<locals>.transformation(view, validate_level)
     67 view = self.coerce_view(view)
     68 self.validate(view, level=validate_level)
---> 70 new_view = transformer(view)
     72 new_view = other.coerce_view(new_view)
     73 other.validate(new_view)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/qiime2/core/transform.py:214, in SingleFileDirectoryFormatType._wrap_input.<locals>.wrapped(view)
    213 def wrapped(view):
--> 214     return transformer(view.file.view(self._wrapped_view_type))

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/q2_types/feature_table/_deferred_setup/_transformers.py:102, in _4(ff)
    100 @plugin.register_transformer
    101 def _4(ff: BIOMV210Format) -> pd.DataFrame:
--> 102     table = _parse_biom_table_v210(ff)
    103     return _table_to_dataframe(table)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/q2_types/feature_table/_deferred_setup/_transformers.py:52, in _parse_biom_table_v210(ff)
     50 def _parse_biom_table_v210(ff):
     51     with ff.open() as fh:
---> 52         table = biom.Table.from_hdf5(fh)
     53         return _drop_axis_metadata(table)

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/biom/table.py:4216, in Table.from_hdf5(cls, h5grp, ids, axis, parse_fs, subset_with_metadata)
   4213     return ids, md, grp_md
   4215 obs_ids, obs_md, obs_grp_md = axis_load(h5grp['observation'])
-> 4216 samp_ids, samp_md, samp_grp_md = axis_load(h5grp['sample'])
   4218 # load the data
   4219 data_grp = h5grp[axis]['matrix']

File /opt/homebrew/Caskroom/miniforge/base/envs/qiime2-tiny-2025.10/lib/python3.10/site-packages/biom/table.py:4189, in Table.from_hdf5.<locals>.axis_load(grp)
   4187 if ids.size > 0:
   4188     ids_dtype = 'U%d' % max([len(v) for v in ids])
-> 4189     ids = np.asarray(ids, dtype=ids_dtype)
   4191 parser = defaultdict(lambda: general_parser)
   4192 parser['taxonomy'] = vlen_list_of_str_parser

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Do you still have all of the source tables that you merged to create this one? One thing you could try would be to run the summarize command on each of those and see where it fails - if it's a single bad sample id, you may be able to narrow down where it's coming from that way.

ebolyen · December 16, 2025, 10:36am

Just throwing in some extra info that I worked out:

I used https://myhdf5.hdfgroup.org/ to view the biom table and opened the sample IDs, then exported those to CSV and ran:

In [1]: with open('Downloads/data.csv', 'rb') as fh:
   ...:     for line in fh:
   ...:         try:
   ...:             line.decode('ascii')
   ...:         except:
   ...:             print(line.decode('utf8'), end='')

Yielding these characters:

González-Serrano4
González-Serrano6
González-Serrano8
González-Serrano16
González-Serrano2
González-Serrano5
González-Serrano13
González-Serrano3
González-Serrano18
González-Serrano9
González-Serrano11
González-Serrano20
González-Serrano22
González-Serrano1
González-Serrano12
González-Serrano17
González-Serrano7
González-Serrano14
González-Serrano19
González-Serrano10
González-Serrano15
González-Serrano21

This is the same byte sequence that np.asarray complained about.

I think it is odd that np.asarray(ids, dtype=ids_dtype) is no longer working. The dtype is set to U<max-length> which should be a Unicode string in numpy. Based on a little bit of testing, it seems as though a byte-string b'foo' is presumed ASCII whereas a unicode string works.

So the following fails:

In [14]: with open('Downloads/data.csv', 'rb') as fh:
    ...:     for line in fh:
    ...:         np.asarray(line, dtype=f'U{len(line)}')
    ...:
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[14], line 3
      1 with open('Downloads/data.csv', 'rb') as fh:
      2     for line in fh:
----> 3         np.asarray(line, dtype=f'U{len(line)}')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

whereas decoding the string first:

In [15]: with open('Downloads/data.csv', 'rb') as fh:
    ...:     for line in fh:
    ...:         np.asarray(line.decode('utf-8'), dtype=f'U{len(line)}')

does not fail.

I’m not sure when this changed or if it did, but since the summary used to work, it sounds like a drift in numpy’s string semantics (which I suppose is possible as a byte-sequence doesn’t actually describe a code-point array, and there’s nothing to describe the encoding to numpy otherwise).

ebolyen · December 16, 2025, 11:09am

I think it is actually a combination of H5Py changes and BIOM:

In the 3.0 series, the following changed:

UTF-8 Variable -> numpy 'O' of bytes (tagged with UTF-8 encoding via dtype)

github.com/h5py/h5py

String conversions for h5py 3.0

opened 03:04PM - 21 Sep 19 UTC

closed 07:46AM - 03 Aug 20 UTC

takluyver

enhancement api changes string/unicode usage

@tacaswell, @aragilar and myself sat down yesterday to discuss how various strin…g-like types should be mapped between Python and HDF5. We have decided we do want to make some changes for h5py 3.0. This will inevitably break some code using h5py, but we think it will create a more consistent API, which is also a better fit with Python 3's separation of bytes & str. We have divided the rules into three kinds of conversion: ```python # 1. Writing to an existing dataset f['data'][:10] = obj # Also applies to creating datasets with data and a specified dtype f.create_dataset('data', dtype=x, data=obj) # 2. Creating a new dataset without a specified dtype f['data'] = obj f.create_dataset('data', data=obj) # 3. Reading data obj = f['data'][:] ``` We aim to ensure that `f['data2'] = f['data1'][:]` will always create the new dataset with the same data type as the copied one. We believe the rules below preserve this - please comment if you notice a case we've missed. ## 1. Writing to existing datasets Everything rejects numpy U dtypes (UTF-32 fixed width), as in h5py 2.x, because there is no equivalent HDF5 type. ### fixed-width ascii - accept bytes - accept str and encode to ascii, error on invalid ascii - If you need to write non-ascii data, encode it first and pass bytes - ~check that it will fit in width, raise if not~ **Edit:** No length checks, matching numpy's behaviour with string arrays. I've put a warning in the docs (PR #1613) about this. ### fixed width utf-8 - accept bytes, ~just check length~ - accept str, encode to utf-8 ### variable width ascii - accept any bytes (except NULL) - accept str, encode to ascii, error on non-ascii ### variable length utf-8 - accept any bytes (except NULL) - accept str, encode to utf-8 ### Opaque - accept bytes, np.void() - reject str ## 2. Creating datasets - numpy object/string array with 'tagged' dtype -> follow the tag - tagged means a dtype with string metadata created by h5py to indicate string charset & width - numpy string array not tagged -> fixed width ascii - numpy object array of bytes not tagged -> ~void opaque~ variable length ascii **(see discussion in comments)** - List of bytes -> ~opaque~ variable length ascii - numpy object array of str not tagged -> variable length utf-8 - list of str -> variable length utf-8 - numpy void array -> opaque - numpy array of U type -> raise ## 3. Reading data - ASCII Fixed -> numpy 'S' (tagged with ASCII encoding) - UTF-8 Fixed -> numpy 'S' (tagged with UTF-8 encoding) - ASCII Variable -> numpy 'O' of bytes (tagged with ASCII encoding via dtype) - UTF-8 Variable -> numpy 'O' of bytes (tagged with UTF-8 encoding via dtype) - Opaque with no tag -> numpy 'V' - Opaque with h5py dtype tag -> follow tag ## Attributes Attributes follow the same rules as for datasets, with a couple of exceptions: - An attribute created from a single str/bytes object will be a scalar vlen string with UTF-8 charset (str) or ASCII (bytes). - **Edit:** Still true, but no longer a special case. - An attribute with a scalar vlen string type will be returned as a single str ~/bytes~ object depending on its charset, to preserve roundtripping. - **Edit:** all vlen string attributes are now read as str (decoded utf-8 with surrogateescape). We no longer return different types anywhere based on ASCII/UTF-8. - Section 1 does not apply, as h5py does not expose a high-level API to modify an attribute.

And this change in BIOM, would have meant that the byte-sequence is no longer decoded:

Which combined with the behavior of np.asarray means that non-ASCII UTF-8 sequences fail.

In [1]: from h5py import File

In [2]: import numpy as np

In [3]: fh = File('Downloads/feature-table.biom')

In [4]: np.asarray(fh.get('sample/ids')[:], dtype='U20')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 np.asarray(fh.get('sample/ids')[:], dtype='U20')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

using .asstr() as described in the later-versions of the docs (which looked to be slightly stale for 3.11.0):

In [5]: np.asarray(fh.get('sample/ids').asstr()[:], dtype='U20')
Out[5]:
array(['li8', 'li12', 'li4', ..., 'yanez-montalvo19', 'yanez-montalvo9',
       'yanez-montalvo1'], dtype='<U20')

Sam_Degregori · December 18, 2025, 12:18am

Hi all,

Thanks for the in depth feedback. @gregcaporaso I do have source files (about 200 studies) so the goal is to not have to start checking them all BUT it honestly shouldn’t be that terrible if I do. Can likely do a string of merges until I narrow it down.

@ebolyen interesting. Does this mean there is a workaround or am I best remerging my studies and then re-replacing that Gonzales study with new sampleIDs.

Thanks all

Sam

ebolyen · December 18, 2025, 9:52am

We’re a little stuck ourselves on what to do next, so I would probably recommend swapping the sample ID’s and re-merging. I’m sorry as I know that’s pretty tedious to do.

If you didn’t care about provenance, you could also swap the IDs in the biom file by using HDF5 directly via an editor (no real suggestions on one), or programmatically via h5py or similar and then saving it again. But that could go wrong in subtle ways, so if you aren’t jumping at the idea, I probably wouldn’t recommend it.

gregcaporaso · December 18, 2025, 3:24pm

@Sam_Degregori, you should experience the failure on ~~loading~~ viewing a problematic artifact, so I don't think you should have to do a series of merges. The following should let you know which is/are problematic:

import glob
import qiime2
import biom

fps = glob.glob('./*.qza')

for fp in fps:
    try:
        qiime2.Artifact.load(fp).view(biom.Table)
        print(f'Successfully loaded artifact: {fp}')
    except UnicodeDecodeError as e:
        print(f'!! Failed to load artifact: {fp}')

Here's some example output:

!! Failed to load artifact: ./merged_table_Jun2_GMTOL.qza
Successfully loaded artifact: ./asv-table.qza

Sorry for the trouble here! As @ebolyen mentioned, we're working on plan for dealing with this.

Sam_Degregori · December 18, 2025, 7:55pm

Great script thanks for sending. Definitely will save me some time. thank you!

system · January 23, 2026, 10:40pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.