Need help understanding the outputs from GNEISS

mortonjt · May 20, 2020, 5:00pm

However, I am having trouble understanding the output generated.

• In the regression summary file (attached) I concluded that:

1- The variables that impacted the model the most were: Site[external] and Animal Type. This make sense to me, since the external sample which (explained about 6% of the community variation) was a control sample collected from the outside of the reproductive tract to control for handling contamination. Thus, we expected a more diverse microbiome on Site[external].
In addition, we also expected to see differences in the microbiome of cows and heifers. Thus, AnimalType[Heifer] being the second most significant variable in our model, explaining about 3.6% of the community variation, makes completely sense to me.
2- Overall, our regression model can explain about 22% of the community variation. I found this result also reasonable as compared to the results from the data presented in the tutorial.
3- In our model, the prediction accuracy (pred_mse) is also less than the within model error (model_mse), suggesting that over fitting is not happening.
4- Am I missing any important conclusions from this section?

Yes, those conclusions are correct - 22% explained variance certainly higher than the average study.

I'd avoid drawing that conclusion from the heatmap - the R^2 differences are a better way for that sort of inference.

Yes, those are the top 10 balances. To find the numerator / denominator, run the balance_taxonomy command.

Its because those sites have fewer samples.

Possibly, I'd run beta-diversity to confirm that, since those plots only show the top 2 balances.

I'd ignore the scatterplots - those were designed as a diagnostic tool for the top balances to see if it is a good fit or not.

The tree branch lengths are scaled by explained variance, designed to help identify useful balances to pass into balance-taxonomy (you can zoom in and highlight the nodes of interest).

We know that the gneiss visualization is highly untuititive, which is part of the reason why we are deprecating the statistical methods in gneiss in favor of aldex2, songbird and ancom.

If you are interested in phylogenetic visualization, I recommend to checkout empress.