What is the nice way to interpret phylogenetic gneiss?

Hi,
I’m very new in the gut microbiome researching field, and want to ask any idea to use phylogenetic gneiss effectively. I think gneiss is the best way to obtain output data that is easy to understand.
The new ilr-phylogenetic command in qiime2-2018.8 is a better solution in that way, in my opinion, and I’m struggling interpreting this output. Interpreting phylogenetic gneiss’ output should be performed just like the PhILR paper, but littlebit hard to identify the balance of interest.

The way I performed was;

  1. Convert newick tree into data.tree object with {data.tree} R package.*
    {data.tree} package enables me to filter balances with hierarchical levels.

  2. Review every balance’s p-value in specific hierarchical level.
    Because the balances with higher levels in hierarchy may be more important in phylogenetic classification, I dug hierarchy one by one from root.

  3. Plot partial tree or list taxa composition of significantly changed balance that is important in phylogenetic classification.

I know this is worse way to interpreting phylogenetic gneiss’ output. The most difficult problem will be how to determine balances of interest.
Any comments from anyone will be appreciated!

Hiro

1 Like

Hi @Hiro,

At a first glance, this seems to be a legit solution -- especially given that the phylogenetic ilr transform here was inspired by PhILR. Note that PhILR is doing a weighted ILR transform, incorporating the branch lengths into the transformation, which we are not doing here.

Right -- choosing good balances is still super tricky problem. When looking for good balances I try to weight the following simultaneuously

  1. Size of coefficient (since this is an estimate of effect size)
  2. Pvalue (since this is an estimate of confidence)
  3. Number of zeros (since this give some measure of bias).

The first two are enabled through the pvalue heatmap (see tutorial for more details).
Haven't figured out a good way to make (3) interactive -- that is still a pretty manual process.

But this is still an active research question -- there are a few of us that are currently proposing new solutions on how to choose meaningful balances. For more ideas check out the following papers

3 Likes

Thank you for your quick reply, @mortonjt ! It really really helped me and I'm now reading the papers.

I have another two questions. One is about your balances choosing criteria,

  1. Number of zeros (since this give some measure of bias).

You mean try to exclude balances contains many zeros? Actually zeros may effect statistical result, but ignoring zeros itself can be a bias, isn't it?

Another one is about my new idea, how about applying gneiss to taxa assigned abundance table? Data looks like,

Family1 Family2 Famlily3 ...
SampleA 10 10 0 ...
SampleB 0 10 0 ...
SampleC 0 0 50 ...
... ... ... ... ...

With ilr-phylogenetic does not incorporate the branch length, applying ilr-hierarchy in specific phylogenetic levels will give us more comprehensible data without losing phylogenetic factor.
Just an idea, I've not tested yet :confused:

1 Like

Not quite - if there are a ton of zeros, the result will be quite biased already (changes in low abundance taxa could really throw off the balance calculations). So I wouldn't trust the coefficients / pvalues coming out of balances that were computed from many low abundance taxa.

That has been proposed before

And that is a totally fine approach - its just balance-taxonomy won't be available, since it won't work on a collapsed table. So if you go this route, you'll need to do your own processing offline via R / Python

2 Likes

An off-topic reply has been split into a new topic: Can gneiss operate on feature table collapsed on taxonomy?

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.