Problems with biom add-metadata --sc-pipe-separated

(Alberto) #1

Dear qiime2 team, I’m having some problems with the biom format when adding pipe-separated metadata (KEGG_Pathways). The summary is that I’m unable to check if the metadata is added correctly, and I suspect it doesn’t.

If I start adding the metadata to a KO vs. samples table:

biom add-metadata -i count-table_metagenomics_KOs-onlyWithKEGG.biom -o count-table_metagenomics_KOs_wMetadata.biom --observation-metadata-fp KOtoKEGG_map_metagenomics.tsv  --sc-pipe-separated KEGG_Pathways --observation-header KEGG_Pathways

This silently accepts the command, but checking with biom summarize-table, there is no metadata created Is it any other way to check this? I convert it below into tsv, but it is difficult to know which format to expect. If I try to generate a json file:

biom add-metadata -i count-table_metagenomics_KOs-onlyWithKEGG.biom -o count-table_metagenomics_KOs_wMetadata_json.biom --observation-metadata-fp KOtoKEGG_map_metagenomics.tsv  --sc-pipe-separated KEGG_Pathways --output-as-json

It generates a file in which it appears the metadata (according to summarize-table), and if I convert it into tsv:

    biom convert -i count-table_metagenomics_KOs_wMetadata_json.biom -o count-table_metagenomics_KOs_wMetadata.tsv --table-type="Ortholog table" --to-tsv  --header-key KEGG_Pathways  --tsv-metadata-formatter sc_separated

It indeed looks good (well, how to know how it should look like? I would expect a format like that but it is difficult to know). But if I then try to use, for instance, this picrust script (that should work with json format)

source activate picrust
categorize_by_function.py -i count-table_metagenomics_KOs_wMetadata.biom -c KEGG_Pathways -l 3 -o KEGGs_real_metagenomes.L3.biom

it fails returning as error:

Traceback (most recent call last):
  File "/home/apascual/pkg/anaconda3/envs/picrust/bin/categorize_by_function.py", line 113, in <module>
    main()
  File "/home/apascual/pkg/anaconda3/envs/picrust/bin/categorize_by_function.py", line 100, in main
    one_to_many_md_key=opts.metadata_category)
  File "/home/apascual/pkg/anaconda3/envs/picrust/lib/python2.7/site-packages/biom/table.py", line 2405, in collapse
    pathway, partition = next(md_iter)
  File "/home/apascual/pkg/anaconda3/envs/picrust/bin/categorize_by_function.py", line 59, in collapse
    for path in md[category]:
TypeError: 'NoneType' object is not iterable

which I suspect it is because the metadata is not correctly formatted.

As a side note, since it is not on the spirit of qiime to work with csv files I understand that the conversion from biom to csv should not be solved for any kind of data. But I would find useful to have a way to get a more comprehensive visualization of the of biom files contents beyond the (as far as I know) the commands validate-table, summarize-table or header. Overall to verify this kind of questions related with correct formatting.

Thanks in advance for the help.

0 Likes