Interpreting Songbird output

jbioinf · February 23, 2021, 6:54pm

Hello

I have followed the songbird tutorial (GitHub - biocore/songbird: Vanilla regression methods for microbiome differential abundance analysis) and generated the attached summarized results. Please, see below.

.

I ran 3 models for this analysis: 1st) A null model, 2nd) a 2nd model containing few important variables, and a 3rd) full model containing all the important variables. It looks like my full model was the best, judging by the lowest cv_error, lowest loss, and highest Pseudo-Q-square compared to the 2nd model (incomplete model) vs Null.

However, I have few questions regarding the interpretation of the qbeta results.

What is the interpretation of the first line of qbeta plots? Can you please explain in simple terms the y and x axis? I think the x-axis are the "interactions" but I didn't quite understand the meaning of that in the tutorial.
How about the interpretation of the second line of qbeta plots containing the qbeta distributions? Are the numbers on the y-axis the interactions? What about the numbers in front of the distribution? The shape of the Full-model distribution seems normally distributed while the Null model or the incomplete model have several peaks. What does that tell you?

Thank you so much for your time and support!

mortonjt · March 1, 2021, 5:09pm

Hi @jbioinf, your plots a looking good!

Regarding your questions

qbeta is just the distribution of log-fold changes - the idea here is to get an inking if there is something weird happening (i.e. everything is bundled around zero or something that looks overfitted).
Its just a distribution of the log-fold changes. And this may be impacted by your choice of prior (a differential prior of 1 will make 99% of your microbes have a log-fold change within +/- 3 fold change).

jbioinf · March 1, 2021, 5:58pm

Hi @mortonjt,

Thanks a lot for your reply! I am sorry, but I am still a little confused interpreting the results. Can you please share your thoughts on:

What are the interactions (x-axis)?
What is your interpretation comparing the first line with the 3 qbeta plots (Full model, Null Model, and Incomplete Model)? There are 6 lines within each of those plots of fold change. What are these lines representing? These lines are closer to each other in the full model, compared to the incomplete model and the null model. The Null model has the largest separation among those lines. What does that tell you?
For the second line of qbeta plots, the shape of the Full-model distribution seems normally distributed (around zero) while the Null model or the incomplete model have several peaks. What is your interpretation of that?

Thank you so much and I am sorry for the repeated questions. I am just trying to understand these results.

mortonjt · March 1, 2021, 6:46pm

x-axis = log-fold change values
I don't understand this question. Are you referring to the cv_error? It looks like the full model is performing the best (which is great!).
It probably means that your priors are shaping the log-fold change estimates (we have a normally distributed prior). You could consider growing your differential prior to be larger, there is a chance that you could get an even better model. The null model is just estimating the average community, so it isn't un-expected that this is peaky.

jbioinf · March 1, 2021, 7:22pm

1. x-axis = log-fold change values
The fold-change is the x-axis for the last row of plot. What about the row above it (e.g. 0, 20k, 40k, 80k, 100k)? Are these the interactions? How are these calculated?

2. I don’t understand this question. Are you referring to the cv_error? It looks like the full model is performing the best (which is great!).
Sorry.. I was referring to the first row of qbeta plots. If you look closely in the graph, there are 6 lines going from 0 to 100k. Comparing the models, the Full model looks tighter compared to the other two.

3. It probably means that your priors are shaping the log-fold change estimates (we have a normally distributed prior). You could consider growing your differential prior to be larger, there is a chance that you could get an even better model. The null model is just estimating the average community, so it isn’t un-expected that this is peaky.

Can you please provide some more information on this? The differentials are the log-fold change of my features. Why would they be normally distributed in the full model but peaky in the null model? Sorry, I am still super confused with this

mortonjt · March 1, 2021, 8:01pm

Got it -- those values are just the gradient updates -- so 80k would refer to 80k gradient steps. Its basically trying to see how your log-fold change estimates converge.

At this point, I recommend just running with your results in qurro -- your fits are good, so might as well start linking numbers to microbes.

jbioinf · March 1, 2021, 9:00pm

Thanks, @mortonjt!

I have already done qurro and identified the microbes of interest. Because I plan on publishing this data, I need to understand my results for describing them in the manuscript. Do you have any reading suggestions for understanding these results better?

system · April 2, 2021, 3:00am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.