Songbird differentials interpretation

wangj50 · November 23, 2020, 3:52pm

Hi @mortonjt,
Good morning! I have a question regarding the songbird differentials.
Could you clarify if the differentials from songbird output (i.e. differentials.tsv) were the clr transformed according to the equation (14) in your paper https://www.nature.com/articles/s41467-019-10656-5.

Additionally, I wonder if any downstream processing needed in order to make a inference mathematically? e.g. a positive differentials would mean that comparing to the reference level, a feature would be relatively more abundant in the denominator group?

Thank you very much!

Jincheng

mortonjt · November 23, 2020, 3:57pm

Hi @wangj50, the reported songbird differentials are already in clr transformed coordinates.
In terms of performing downstream inference, songbird already provides a means to sort the microbes according to their fold change (up to a constant! No one can do better without additional assumptions)

@fedarko has a slick interactive tool called qurro that can help interactively explore these rankings with your data : GitHub - biocore/qurro: Visualize differentially ranked features (taxa, metabolites, ...) and their log-ratios across samples

wangj50 · November 23, 2020, 4:01pm

Thanks @mortonjt. In that sense, would "+" or "-" sign in the current differentials mean anything? Or they should be treated only as mean of comparison with the other values.

and thank you for the suggestions of qurro, I will explore.

mortonjt · November 23, 2020, 4:13pm

Not really; zero is only indicative of the average fold change. If you don't know how much the total biomass has changed, you don't know what exactly a zero fold change looks like.

So no, + and - don't mean anything by themselves. The only thing you can rely on is the ordering, and focus on those that the most positive and the most negative.

wangj50 · November 23, 2020, 4:15pm

Thanks a lot! This really helps clarify things.

wangj50 · November 24, 2020, 4:06pm

Upon further check my samples. I found there are quite a few features that songbird differentials have the opposite direction as the relative abundance.

E.g. Taxa A has a songbird differential of log(group1/group2)+K = 2, but by relative abundance the Taxa A has a lower relative abundance in group 1 than group2.

It feels a little bit counter-intuitive.

Thanks!

Jincheng

mortonjt · November 24, 2020, 4:20pm

Then your formula is probably backwards... I have a blog post on how to design patsy formulas here

If you haven't already, I'd recommend to double check in Qurro.

wangj50 · November 24, 2020, 4:42pm

Thanks for your prompt reply. My formulas are correct, I wrote my code based on the blogspot post and songbird tutorial on github. Also if the formulas are wrong, the shouldn't all the features should be in the wrong direction? My case is only a hand of features had the wrong sign.

I can explore qurro. But I confirmed my formula is correct.

mortonjt · November 24, 2020, 8:19pm

Hi @wangj50, I'm having a hard time understanding your question. It would help if you posted your qurro plots (both the ratios and the ranks that you are talking about).

wangj50 · November 24, 2020, 9:30pm

Thanks! Please see the rank plot:

Here are the numerators are bacteroides ASVs, and denominators streptococcus ASVs.
And the sample plot:

As you can see here:
(1) a lot of red Bacteroides are having positive differentials, which I believe means their estimated log(T.CS/T.Vag) + K>0, however, when you check the relative abundances of these ASVs (not seen here), on average, they are more abundant in T.Vag group. Since most of their differentials are positive, I would naturally think their relative abundance would be higher in the T.CS group. Even considering the compositional nature, I would think the direction of songbird differentials should agree with the direction of comparison of relative abundance.

(2) in the sample plot, the log ratio of Bacteroides/Streptococcus is larger in T.Vag group is understandable and expected. So I have no question on this.

Thank you!

Jincheng

mortonjt · November 24, 2020, 10:50pm

Great! I'm glad you were able to resolve your question.

fedarko · November 25, 2020, 2:31am

I'm not sure that the directions agreeing, as you describe it here, is a guarantee. @mortonjt would know best, but what I think might be happening here is that all (or at least most) of the features observed in your study are decreasing to some degree between the Vag and CS conditions.

A scenario like this is described in Fig. 2 of the paper introducing Songbird -- in the dataset shown here there are two groups of oral microbiome samples, one collected before participants brushed their teeth and one collected after participants brushed their teeth. Of course, most features' absolute abundances (verified in that paper using flow cytometry) are lower to some extent in the after-brushing samples than in the before-brushing samples, which is as expected -- and yet there are still a lot of features on both sides of 0 in log(before brushing / after brushing) + K (see Fig. 2b, copied below).

The reason is that, for that dataset, it's more about which features are decreasing less or more than other features, on average; the features with highly ranked differentials for log(before brushing / after brushing) + K are generally more associated with before-brushing samples, i.e. decreasing more between before ==> after brushing, while the features with lowly ranked differentials are generally more associated with after-brushing samples, i.e. decreasing less between before ==> after brushing. Quoting the paper:

These results are consistent with our knowledge about oral biogeography. Haemophilus is typically found on the periphery of oral biofilms and was likely removed from the biofilm during the brushing process, whereas Actinomyces is generally found on the surface of the tooth and acts as an anchor for biofilm attachment [25]. Importantly, this experiment demonstrates the potential fallibility of relying on relative abundance; it is incorrect to conclude that Actinomyces increases after tooth brushing despite the increase in relative abundance. As demonstrated by flow cytometry, total microbial load decreases, and while both Haemophilus and Actinomyces decrease, Haemophilus decreases more.

Taking things back to your dataset, what may be happening (and again I am not 100% sure about this!) is that, while some or all Bacteroides are still decreasing between Vag and CS samples, they are just decreasing less than many other features in the dataset are. I'm not familiar enough with infant microbiome studies (I'm guessing Vag means "delivered vaginally", while CS means "delivered via C-section"?) to say if this makes sense with the specific genera you've mentioned, but at least anecdotally I think this sort of pattern may make sense to see in a study of infant microbiomes in vaginal vs. C-section births.

Hope this helps clarify things; please let me know if I misunderstood your question, and @mortonjt please let me know if I messed something up >_>

wangj50 · November 25, 2020, 4:07am

Thank you! I was not sure if this is the case and thought it might be.

And I agree with you on the general sentiment that this could mean that relatively to many other taxa, Bacteroides could have decreased less.