Gneiss ols-regression formula for building linear models in differential analysis

Hello!
I'm finally close to the end of my metagenomics analysis, for which I'm trying to perform a differential analysis through gneiss, to get an oveview about differences among samples. I've used ANCOM too, but I'm trying both approaches.
I'm using this tutorial.

As suggested, first, I've performed correlation clustering with qiime gneiss correlation-clustering , and built linear models with balances and the plugin qiime gneiss ilr-hierarchical.
I'm now about to use qiime gneiss ols-regression

In the tutorial, this is the suggested code:

qiime gneiss ols-regression
--p-formula "Subject+Sex+Age+BMI+sCD14ugml+LBPugml+LPSpgml"
--i-table balances.qza
--i-tree hierarchy.qza
--m-metadata-file sample-metadata.tsv
--o-visualization regression_summary.qzv

And this is where I got lost, more specifically, at the exact meaning of that formula. Apparently, the first four terms represent the 'categorial' type columns of my metadata file, but I can't figure out what the additional terms CD14ugml, LBPugml, LPSpgml are and what those letters stand for.

If I try to run the plugin and still include them in the formula, I get:

Plugin error from gneiss:

Error evaluating factor: NameError: name 'LPSpgml' is not defined
Replicate+Site+Soil+sCD14ugml+LBPugml+LPSpgml

I was able to run the plugin with just the columns names of my metadata file, but I'd like to learn more about such terms. What do they represent and how can I customize them?

Thanks in advance!

These formula terms should be column headers in your own sample metadata file. In the tutorial you are following these are specific to the metadata used in that tutorial. What do they mean? Look like they are measurements of CD14, LBP, and LPS in those samples (or maybe corresponding serum samples collected from the same patient at the same time point?)

So to customize you should choose meaningful metadata that you wish to test in your own experiment... it will be different each time! Looks like you have the right idea... you grabbed terms like "Site" and "Soil" that are going to be meaningful to your experiment.

Good luck!

2 Likes

Thank you, I didn't download the sample data and got confused: such terms didn't look like metadata column names but more like statistical parameters...at least according to their names. I think the formula mentioned in the tutorial right above the code I copied here also confused me, making me believe there were some strange parameters I was unaware of involved (in fact, the letters which confused me were only concentrations...)

The terms are all present in the sample metadata.
I guess I'll just use the most relevant ones in my metadata file then!

Thanks for your answer!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.