I have a rooted tree in Newick format and an OTU table in tsv format attached here: otu.txt (90 Bytes) tree.txt (83 Bytes)
I would like to calculate the UniFrac distances after importing these two files. I have a more complicated dataset I would eventually work on, but I reduced it to a toy example to make sure the workflow is working. I am using QIIME2 2019.7.
The error I'm getting is Plugin error from diversity: The table does not appear to be completely represented by the phylogeny.
Previous forum topics have suggested that the OTUs are not in the tree. However A, B, C, D toy "OTUs" are clearly leaves in the attached toy tree. My guess is that I'm not importing things correctly or have formatting issues somewhere, but I can't figure out where.
Thank you for the detailed inquiry. I can verify the issue. Note it is a two parter, first is the newick file is malformed and missing the closing semicolon. However, once resolved, the underlying library is failing to create the output. At the moment, I'm unsure why, and am contacting the relevant developer.
Thank you very much for looking into it. Yes I have tried both with and without the semicolon for the tree at some point. Looking forward to hearing from you!
We've found the issue. The newick parser for the underlying unifrac package is sensitive to leading whitespace in the tipnames. Technically, "(A:1, B:1)" valid and should result in a tree with tips "A" and "B". However, the parser was not stripping whitespace, and as a result the code was seeing tips "A" and " B".
We'll get a bug fix in. For the time being, I recommend using whitespace stripped trees -- it's not unusual to encounter newick parsers with some variance in handling as it's a notorious format. The issue tracking this can be found here.
@thermokarst, given how robust the skbio newick parser is, would it make sense to read/write on import of Phylogeny[Rooted] to standardize the on disk representation?