Dear qiime2 team, I’m having some problems with the biom format when adding pipe-separated metadata (KEGG_Pathways). The summary is that I’m unable to check if the metadata is added correctly, and I suspect it doesn’t.
If I start adding the metadata to a KO vs. samples table:
biom add-metadata -i count-table_metagenomics_KOs-onlyWithKEGG.biom -o count-table_metagenomics_KOs_wMetadata.biom --observation-metadata-fp KOtoKEGG_map_metagenomics.tsv --sc-pipe-separated KEGG_Pathways --observation-header KEGG_Pathways
This silently accepts the command, but checking with biom summarize-table, there is no metadata created Is it any other way to check this? I convert it below into tsv, but it is difficult to know which format to expect. If I try to generate a json file:
biom add-metadata -i count-table_metagenomics_KOs-onlyWithKEGG.biom -o count-table_metagenomics_KOs_wMetadata_json.biom --observation-metadata-fp KOtoKEGG_map_metagenomics.tsv --sc-pipe-separated KEGG_Pathways --output-as-json
It generates a file in which it appears the metadata (according to summarize-table), and if I convert it into tsv:
biom convert -i count-table_metagenomics_KOs_wMetadata_json.biom -o count-table_metagenomics_KOs_wMetadata.tsv --table-type="Ortholog table" --to-tsv --header-key KEGG_Pathways --tsv-metadata-formatter sc_separated
It indeed looks good (well, how to know how it should look like? I would expect a format like that but it is difficult to know). But if I then try to use, for instance, this picrust script (that should work with json format)
source activate picrust
categorize_by_function.py -i count-table_metagenomics_KOs_wMetadata.biom -c KEGG_Pathways -l 3 -o KEGGs_real_metagenomes.L3.biom
it fails returning as error:
Traceback (most recent call last):
File "/home/apascual/pkg/anaconda3/envs/picrust/bin/categorize_by_function.py", line 113, in <module>
main()
File "/home/apascual/pkg/anaconda3/envs/picrust/bin/categorize_by_function.py", line 100, in main
one_to_many_md_key=opts.metadata_category)
File "/home/apascual/pkg/anaconda3/envs/picrust/lib/python2.7/site-packages/biom/table.py", line 2405, in collapse
pathway, partition = next(md_iter)
File "/home/apascual/pkg/anaconda3/envs/picrust/bin/categorize_by_function.py", line 59, in collapse
for path in md[category]:
TypeError: 'NoneType' object is not iterable
which I suspect it is because the metadata is not correctly formatted.
As a side note, since it is not on the spirit of qiime to work with csv files I understand that the conversion from biom to csv should not be solved for any kind of data. But I would find useful to have a way to get a more comprehensive visualization of the of biom files contents beyond the (as far as I know) the commands validate-table, summarize-table or header. Overall to verify this kind of questions related with correct formatting.
Thanks in advance for the help.