Ghost tree filtering?

Jennifer_Fouquier · April 23, 2019, 6:43pm

Hi @ihoxie, so I had an epiphany recently when @thermokarst mentioned that the Newick format by design converts underscores to spaces if the ID is not placed into single quotes. This is mentioned inconsistently in Newick format documentation and when ghost-tree was developed I was unaware of this. The original UNITE IDs I was working with did not have underscores, so it just never came up. So your issues were 100% not your fault. Thanks to you and Matthew for working with me on this! I know it was time consuming.

You can find newly built and correctly formatted trees for the s_02.02.2019 and 02.02.2019 UNITE versions here.

I like the 80% clustered ones so that you don't discard a lot of unclassified organisms, but that's up to you depending on your project and desired accuracy vs lost data.

I wanted to make sure I could get your data through to Emperor so I checked it here using the original files you sent me and the feature table was filtered to remove IDs that are not found in the tree you gave me. Please note though, that as @thermokarst mentioned, you were using inconsistent databases accidentally, so I would make sure to use the same UNITE database as well as the corresponding ghost tree.

I

Let me know if you have any questions!