Importing otu table and phylogeny to calculate UniFrac: The table does not appear to be completely represented by the phylogeny.

Hello!

I have a rooted tree in Newick format and an OTU table in tsv format attached here:
otu.txt (90 Bytes) tree.txt (83 Bytes)

I would like to calculate the UniFrac distances after importing these two files. I have a more complicated dataset I would eventually work on, but I reduced it to a toy example to make sure the workflow is working. I am using QIIME2 2019.7.

The code I run is here:

qiime tools import --input-path tree.txt --output-path rooted-tree.qza --type 'Phylogeny[Rooted]'
biom convert -i otu.txt -o otu.biom --to-hdf5 --table-type="OTU table"
qiime tools import --input-path otu.biom --type 'FeatureTable[Frequency]' --input-format BIOMV210Format --output-path otu.qza
qiime diversity beta-phylogenetic --i-phylogeny rooted-tree.qza --i-table otu.qza --p-metric unweighted_unifrac --o-distance-matrix uu.qza

The error I'm getting is Plugin error from diversity: The table does not appear to be completely represented by the phylogeny.

Previous forum topics have suggested that the OTUs are not in the tree. However A, B, C, D toy "OTUs" are clearly leaves in the attached toy tree. My guess is that I'm not importing things correctly or have formatting issues somewhere, but I can't figure out where.

Thank you in advance for your help!

Hi @ctanes,

Thank you for the detailed inquiry. I can verify the issue. Note it is a two parter, first is the newick file is malformed and missing the closing semicolon. However, once resolved, the underlying library is failing to create the output. At the moment, I'm unsure why, and am contacting the relevant developer.

Best,
Daniel

1 Like

Thank you very much for looking into it. Yes I have tried both with and without the semicolon for the tree at some point. Looking forward to hearing from you!

Best,
Ceylan

Hi @ctanes,

We've found the issue. The newick parser for the underlying unifrac package is sensitive to leading whitespace in the tipnames. Technically, "(A:1, B:1)" valid and should result in a tree with tips "A" and "B". However, the parser was not stripping whitespace, and as a result the code was seeing tips "A" and " B".

We'll get a bug fix in. For the time being, I recommend using whitespace stripped trees -- it's not unusual to encounter newick parsers with some variance in handling as it's a notorious format. The issue tracking this can be found here.

@thermokarst, given how robust the skbio newick parser is, would it make sense to read/write on import of Phylogeny[Rooted] to standardize the on disk representation?

Best,
Daniel

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.