Gneiss output interpretation

Hello, new QIIME user here. I am interested in looking at how a continuous variable (logHue) explains differentiation of microbial composition using the gneiss plugin. I collapsed the taxa, added the pseudocount, performed correlation clustering, created the balances, and performed simple linear regression with the variable of interest.

In the regression summary regression_summary-breeding-l3-logHue.qzv (246.6 KB) I found that two balances were significantly related to logHue (y32 and y54). However, in the balance summary for y32, in the first scatterplot, the relationship between the balance and logHue is not immediately apparent (as one might expect a strong effect after strong p-value corrections).

My question is, (1) are these results convincing? I am new to these analyses, and I am very skeptical, but I'd like to hear the opinion of those with more experience, and (2) with a continuous variable, should I draw any interpretation from the proportion plot? It appears to create two groups from my continuous variable (presumably based on the mean/median of logHue), but the difference in this plot is also not very strong (maybe a slight bent towards the group above the average having a higher proportion of Order Burkholderiales). (3) Lastly, is there any value in paying attention to the intercept? I imagine like most all regression analyses this is nothing of interest, but best not to leave any stone unturned :slight_smile:

I appreciate the help.

The code I ran to run these analyses is here:

qiime gneiss add-pseudocount --i-table table-wo-cm-max8e5-min3samples-breeding-collapsed-l3.qza --p-pseudocount 1 --o-composition-table composition-breeding-l3.qza

qiime gneiss correlation-clustering --i-table composition-breeding-l3.qza --o-clustering hierarchy-breeding-l3.qza

qiime gneiss ilr-transform --i-table composition-breeding-l3.qza --i-tree hierarchy-breeding-l3.qza --o-balances balances-breeding-l3.qza

qiime gneiss ols-regression --p-formula "logHue" --i-table balances-breeding-l3.qza --i-tree hierarchy-breeding-l3.qza --m-metadata-file updated_metadata.txt --o-visualization regression_summary-breeding-l3-logHue.qzv

qiime gneiss balance-taxonomy --i-table composition-breeding.qza --i-tree hierarchy-breeding.qza --i-taxonomy taxonomy.qza --p-taxa-level 3 --p-balance-name 'y32' --m-metadata-file updated_metadata.txt --m-metadata-column logHue --o-visualization y32_taxa_summary-l3.qzv

Oops - just noticed the balance summary did not attach: y32_taxa_summary-l3.qzv (106.0 KB)

@Pierce_Hutton, first off - I’m glad that you are skeptical, it is not good practice to blindly trust software tools. Regarding your specific questions

  1. No, these results are not convincing – I don’t see any major differences. If you are getting low p-values, it is likely artifactual due to the high levels of sparsity in those species. As a rule of thumb, you shouldn’t be looking at balances close to the tips of the tree.

  2. Hmm, no. The proportion plots are there to help understand the distribution of the species in each part of the balance. But you should be very careful about interpreting what has a higher proportion - since proportions can be extremely misleading. See here:

  3. Intercepts are just bias constants to help model the differences between samples. Those are totally ok to keep in your analyses. However, if you really really want to remove them, you can with --p-formula=<your-formula> - 1. See patsy docs here on how to construct formulas

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.