Yes, those conclusions are correct - 22% explained variance certainly higher than the average study.
I'd avoid drawing that conclusion from the heatmap - the R^2 differences are a better way for that sort of inference.
Yes, those are the top 10 balances. To find the numerator / denominator, run the balance_taxonomy
command.
5- Why the width of some of the blue columns in the heatmap are smaller than others? Is it a result of some sites having a lower OTU diversity?
Its because those sites have fewer samples.
• Prediction and residual plots
1- From the prediction and residual plots my only conclusion is that we might have 4 outliers.
2- What else can you conclude based on these plots?
Possibly, I'd run beta-diversity to confirm that, since those plots only show the top 2 balances.
• Explained Sum of Squares
1- Unfortunately I have no idea of what is going on here. In the tutorial, by looking at this plot, it was concluded that the balance of y0 was important. Can you please shed some light on that?
2- Also, it was concluded that “The balances not only have very small p-values (with p<0.05) for differentiating subjects, but they also have the largest branch lengths in the tree diagram. This suggests that this partition of microbes could differentiate the CFS patients from the controls.” Can you please explain how those conclusions were drawn from the plot? Due to my lack of understanding of the plot, I only see a meaningless branched tree.
I'd ignore the scatterplots - those were designed as a diagnostic tool for the top balances to see if it is a good fit or not.
The tree branch lengths are scaled by explained variance, designed to help identify useful balances to pass into balance-taxonomy (you can zoom in and highlight the nodes of interest).
We know that the gneiss visualization is highly untuititive, which is part of the reason why we are deprecating the statistical methods in gneiss in favor of aldex2, songbird and ancom.
If you are interested in phylogenetic visualization, I recommend to checkout empress.