Selection Balances with Gneiss

Zach_Burcham · January 15, 2019, 5:48pm

Hello,

I am a little confused on the best method to select which balance to use with Gneiss. I created a model and based on the Gniess tutorial I should be looking at the branch length and the p-values from the heatmap. Does that mean I want to select the balance that is the furthest away (which would be y11 on me tree)? But, this balance does not contain any significant p-values for my variables so should be I be choosing the furthest balance that contains significant variables? Though that feels like I'm trying to lead the results in way.

Thanks,
Zach

EDIT: I tried to upload my summary file but got an error that the file is too large

thermokarst · January 15, 2019, 7:01pm

You can upload to some other file-sharing service, like Dropbox or Google Drive, then post a link to that here.

Zach_Burcham · January 15, 2019, 7:02pm

Good point!

https://www.dropbox.com/s/h0pta2n915cwat2/all_no-lbs-lost_ols_regression_summary.qzv?dl=0

mortonjt · January 15, 2019, 8:49pm

Hmm - it doesn't look like there are any outstanding hits with this analysis. There are really only 2 possibilities

There is no signal at all, and you are actually overfitting your model. This is still possible since you have a ton of variables in your model, and none them seem to have any strong balances (with high coefficients and low p-values).
You have a really bad tree - either you are including too many contaminants or the variance partitioning strategy is just not working (when you run the correlation-clustering algorithm). It maybe worth looking at the phylogenetic tree to see if the signal is also dampened there too.

But looking at this summary, it's more likely there is actually no signal. Were you able to pull out anything from standard ordination analyses?

Zach_Burcham · January 16, 2019, 11:39pm

@mortonjt Thank you for the response.

Since I have so many variables is there a preferred way to determine which should be in the model? I tried including them all then looking for the R2 difference, but like you said it is leading to no signal and no strong balances. Should I only include those that are significantly different from beta/alpha diversity testing?
So after further diving in with the PCOA it seems like I have a problem with batch effect I need to try and fix first. I'm also going to to look into trying to get out any contaminants with decontam. I will also try a phylogenetic tree once I get there. Is there a way to tell is a tree is "good" or "bad"?

Thanks

mortonjt · January 18, 2019, 8:03pm

Yes
I don't know of a way to determine what a good tree is. The best approach at the moment is to stick to a tree that is most relevant for your problem. For instance, if you are interested in understanding evolutionary patterns, maybe the phylogenetic tree would be better. If you don't know, maybe the variance partitioned tree that you used is the way to go. For your scenario, it is most likely that the problem is due to something else (i.e. batch effects). Note that you can somewhat correct for batch effects in your model by adding that to the formula.

Zach_Burcham · January 18, 2019, 8:07pm

Would I add batch effects in the model by something like "age+sex+batcheffect"?

mortonjt · January 28, 2019, 4:30pm

Yes, that how I would approach handling batch effects.

system · February 28, 2019, 10:46pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.