Conversion of biom to tsv not working

Hi all, I am running into an issue I have not encountered before.

I am trying to add the taxonomy onto my biom table using this tutorial. I have no apparent issue when adding the taxonomy to my biom file

biom add-metadata -i frequency_table.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomy.tsv --sc-separated Taxonomy

But when I am getting an error when trying to convert this file to tsv:

biom convert -i table-with-taxonomy.biom -o table-with-taxonomy.tsv --to-tsv --header-key Taxonomy
Traceback (most recent call last):
File "/home/ortmannac/miniconda3/envs/qiime2/bin/biom", line 11, in
sys.exit(cli())
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/cli/table_converter.py", line 125, in convert
_convert(table, output_fp, sample_metadata_f, observation_metadata_f,
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/cli/table_converter.py", line 182, in _convert
result = table.to_tsv(header_key=header_key,
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/table.py", line 5251, in to_tsv
return self.delimited_self('\t', header_key, header_value,
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/table.py", line 1732, in delimited_self
md_out = metadata_formatter(md.get(header_key, None))
File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/cli/table_converter.py", line 35, in
'sc_separated': lambda x: '; '.join(x),
TypeError: sequence item 0: expected str instance, bytes found

I have performed this step multiple times before and have not encountered this issue. The only difference is that I updated my QIIME from v2023.2 to 2023.9. My QIIME is installed in a conda environment.

Thanks for your help.

I suspect this is an issue with the formatting of the taxonomy.tsv file, because that's usually the issue I have on this step.

Can you post the first few lines of that file here so we can take a look?

head taxonomy.tsv

So after I export the taxonomy, I change the header as recommended in the tutorial:

This is before header change:

head taxonomy.tsv

Feature ID	Taxon	Confidence
c497da3b39f30aceede6bec3b03cd100	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__SAR11_clade;f__Clade_I;g__Clade_Ia;s__	0.9682518022532978
c1797175f7325ec39a2cd4bd3659691a	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Thioglobaceae;g__SUP05_cluster;s__	0.9938810854429997

And then I use these commands to change the header:

sed  's/Feature ID/#OTUID/g' taxonomy.tsv > taxonomy2.tsv
sed  's/Taxon/Taxonomy/g' taxonomy2.tsv > taxonomy3.tsv

And this is after header change:

head taxonomy3.tsv

#OTUID	Taxonomy	Confidence
c497da3b39f30aceede6bec3b03cd100	d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__SAR11_clade;f__Clade_I;g__Clade_Ia;s__	0.9682518022532978
c1797175f7325ec39a2cd4bd3659691a	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Thioglobaceae;g__SUP05_cluster;s__	0.9938810854429997

And then I perform the following with no issues:
biom add-metadata -i frequency_table.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomy3.tsv --sc-separated Taxonomy

This is where I get that error:
biom convert -i table-with-taxonomy.biom -o table-with-taxonomy.tsv --to-tsv --header-key Taxonomy

Very strange!

I wonder if this is an issue with the biom add-metadata command that is only appearing on the following command

Can you run this? I want to see if the Taxonomy key is really there (or not!)
biom summarize-table -i table-with-taxonomy.biom

Yes of course. Here is the output. It looks like Taxonomy column is added?

biom summarize-table -i table-with-taxonomy.biom

Num samples: 41
Num observations: 3,871
Total count: 41
Table density (fraction of non-zero values): 0.179

Counts/sample summary:
 Min: 1.000
 Max: 1.000
 Median: 1.000
 Mean: 1.000
 Std. dev.: 0.000
 Sample Metadata Categories: None provided
 Observation Metadata Categories: Confidence; Taxonomy

Counts/sample detail:
KYLW1S: 1.000
TRIC1W: 1.000
BONC1S: 1.000

It's not a huge deal as I can export the files separately into Excel and use VLOOKUP to add the taxonomy but strange that it's not working!

Could this be it?

EDIT I'm wrong
sc is for semicolon; you have tabs in your taxonomy3.tsv.
Comma separated, so csv not tsv, I guess:

  --sc-separated TEXT             Comma-separated list of the metadata fields
                                  to split on semicolons. This is useful for
                                  hierarchical data such as taxonomy or
                                  functional categories.

Is it possible that this is a single column called Confidence; Taxonomy?

You could also try exporting the biom file as json and enspect that:
biom table-to-json -i table-with-taxonomy.biom

Would that have happened during this step?

biom add-metadata -i frequency_table.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomy3.tsv --sc-separated Taxonomy

How would I change that so it's just adding the taxonomy column?

Also converting to json gave me the following error?

biom convert -i table-with-taxonomy.biom -o table.json.biom --table-type="OTU table" --to-json

Traceback (most recent call last):
  File "/home/ortmannac/miniconda3/envs/qiime2/bin/biom", line 11, in <module>
    sys.exit(cli())
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/cli/table_converter.py", line 125, in convert
    _convert(table, output_fp, sample_metadata_f, observation_metadata_f,
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/cli/table_converter.py", line 206, in _convert
    write_biom_table(result, fmt, output_filepath)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/cli/util.py", line 25, in write_biom_table
    f.write(table.to_json(biom.parse.generatedby()))
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/site-packages/biom/table.py", line 4827, in to_json
    f'{{"id": {dumps(obs[1])}, "metadata": {dumps(obs[2])}}},'
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/home/ortmannac/miniconda3/envs/qiime2/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable

Thank you!

I'm not sure what that new error tells us...

I suspect something is not quite perfect about the taxonomy3.tsv file. Try converting it to csv then do the add metadata and convert to text again!

Hi @emmlemore,

Would it be possible to share the table and taxonomy file? I maintain biom and am unsure what's going on, but clearly it's not acting in a friendly way

Best,
Daniel

1 Like

Hi @wasade here are the files. I've attached the frequency_table.biom which is the ASV table before taxonomy addition, the table-with-taxonomy.biom file which is supposed to have the taxonomy added, and the raw taxonomy taxonomy3.tsv. Thank you!
frequency_table.biom (523.6 KB)
table-with-taxonomy.biom (1.6 MB)
taxonomy3.tsv (4.1 MB)

1 Like

Hi @emmlemore,

Thank you for sharing the files. Using --sc-separated Taxonomy completes without error as I think you saw:

$ biom add-metadata -i frequency_table.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomy3.tsv --sc-separated Taxonomy

The resulting table has both the confidence and taxonomy information per feature:


In [10]: t = biom.load_table('table-with-taxonomy.biom')

In [11]: t.metadata(axis='observation')[:5]
Out[11]: 
(defaultdict(<function biom.table.Table._cast_metadata.<locals>.cast_metadata.<locals>.<lambda>()>,
             {'Confidence': '0.9682518022532978',
              'Taxonomy': array([b'd__Bacteria', b'p__Proteobacteria', b'c__Alphaproteobacteria',
                     b'o__SAR11_clade', b'f__Clade_I', b'g__Clade_Ia', b's__'],
                    dtype=object)}),
 defaultdict(<function biom.table.Table._cast_metadata.<locals>.cast_metadata.<locals>.<lambda>()>,
             {'Confidence': '0.9938810854429997',
              'Taxonomy': array([b'd__Bacteria', b'p__Proteobacteria', b'c__Gammaproteobacteria',
                     b'o__Pseudomonadales', b'f__Thioglobaceae', b'g__SUP05_cluster',
                     b's__'], dtype=object)}),
 defaultdict(<function biom.table.Table._cast_metadata.<locals>.cast_metadata.<locals>.<lambda>()>,
             {'Confidence': '0.9992720104737933',
              'Taxonomy': array([b'd__Archaea', b'p__Crenarchaeota', b'c__Nitrososphaeria',
                     b'o__Nitrosopumilales', b'f__Nitrosopumilaceae',
                     b'g__Candidatus_Nitrosopumilus', b's__'], dtype=object)}),
 defaultdict(<function biom.table.Table._cast_metadata.<locals>.cast_metadata.<locals>.<lambda>()>,
             {'Confidence': '0.9995434932442715',
              'Taxonomy': array([b'd__Archaea', b'p__Crenarchaeota', b'c__Nitrososphaeria',
                     b'o__Nitrosopumilales', b'f__Nitrosopumilaceae',
                     b'g__Candidatus_Nitrosopumilus', b's__'], dtype=object)}),
 defaultdict(<function biom.table.Table._cast_metadata.<locals>.cast_metadata.<locals>.<lambda>()>,
             {'Confidence': '0.9999999999364775',
              'Taxonomy': array([b'd__Bacteria', b'p__SAR324_clade(Marine_group_B)',
                     b'c__SAR324_clade(Marine_group_B)',
                     b'o__SAR324_clade(Marine_group_B)',
                     b'f__SAR324_clade(Marine_group_B)',
                     b'g__SAR324_clade(Marine_group_B)', b's__'], dtype=object)}))

The reason that the resulting table cannot be exported to JSON or TSV is that on parse the structure is represented as a NumPy array and using byte strings rather than UniCode, and it looks like the formatters are not accounting for that.

This is surprising as we have not adjusted those portions of code in biom for a long time, and the comment of this working on 2023.2 but not 2023.9 is puzzling on the surface as versions of Python are consistent. A piece of this issue was fixed in #886 with version 2.1.15 although it does not solve this issue completely.

I'll open issues on Github w.r.t. at least three problems I see here: 1) Confidence is not represented in its native floating point type, 2) Taxonomy is using bytes rather than str, which may be a long-term hold over from Python 2 days and is the likely disruption on conversion to TSV/JSON, and 3) the error messages coming back are not helpful and can be readily improved.

Does that cover it or are there additional problems?

https://github.com/biocore/biom-format/issues/944
https://github.com/biocore/biom-format/issues/945
https://github.com/biocore/biom-format/issues/946

Best,
Daniel

1 Like

Hi @wasade,

Yes that covers all the errors! Thanks so much for your time. I'll keep track of those issues on Github and wait for the answers.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.