Advice on Gneiss Output


I’m currently exploring a microbiome dataset with gneiss and was wondering if I could get some advice regarding my output.

The dataset consists of 19 samples (15 in group “A” and only 4 in group “B”) so I assumed that it would be difficult to tease apart any inter-group variation. I wanted to explore preliminary differences before additional sequencing was performed to increase sample sizes.

I began by running gneiss ols-regression and dendrogram-heatmap. The “regression_summary.qzv” output contains 19 samples and 3 covariates (violating the 1 in 10 rule) but pred_mse is consistently lower than model_mse. Can I therefore assume that my model is not overfitting? Also, if I am interested in significant taxa differences within the “samplegroup” covariate, can I preferentially explore only the “y8” balance as this has the only significant corrected Pvalue with respect to “samplegroup” in the Regression Coefficients Summary?

The gneiss balance-taxonomy output for “y8” and “samplegroup” (taxa.qzv) contains a list of numerator and denominator taxa. Is it acceptable to say that the balance of these taxa is significantly different between groups A and B based on the corrected Pvalue found in the regression summary file?

Thank you in advance; any perspective on this output would be greatly appreciated.

regression_summary.qzv (118.0 KB)
heatmap.qzv (69.1 KB) taxa.qzv (96.9 KB)

1 Like

As a follow-up question, the corrected p-values found the “fdr-corrected-pvalues.csv” file do not seem to match those in the Regression Coefficients Summary in “regression_summary.qzv”. Does anyone know which should be used?

Hi @smithj, yes that is how gneiss was originally designed.

However, I do want to note that the recommended differential abundance workflow has changed quite a bit since the development of gneiss. Since our latest diff abundance paper, we have shown alternative interpretations to differential abundance.

I’ve developed a simple multinomial regression called songbird. But the same concepts also work with DESeq2 and Aldex2 (just don’t trust their pvalues). And qurro provides a nice interactive interface for exploration.