Additionally, I wonder if any downstream processing needed in order to make a inference mathematically? e.g. a positive differentials would mean that comparing to the reference level, a feature would be relatively more abundant in the denominator group?
Hi @wangj50, the reported songbird differentials are already in clr transformed coordinates.
In terms of performing downstream inference, songbird already provides a means to sort the microbes according to their fold change (up to a constant! No one can do better without additional assumptions)
Thanks for your prompt reply. My formulas are correct, I wrote my code based on the blogspot post and songbird tutorial on github. Also if the formulas are wrong, the shouldn’t all the features should be in the wrong direction? My case is only a hand of features had the wrong sign.
I can explore qurro. But I confirmed my formula is correct.
Here are the numerators are bacteroides ASVs, and denominators streptococcus ASVs.
And the sample plot:
As you can see here:
(1) a lot of red Bacteroides are having positive differentials, which I believe means their estimated log(T.CS/T.Vag) + K>0, however, when you check the relative abundances of these ASVs (not seen here), on average, they are more abundant in T.Vag group. Since most of their differentials are positive, I would naturally think their relative abundance would be higher in the T.CS group. Even considering the compositional nature, I would think the direction of songbird differentials should agree with the direction of comparison of relative abundance.
(2) in the sample plot, the log ratio of Bacteroides/Streptococcus is larger in T.Vag group is understandable and expected. So I have no question on this.
I'm not sure that the directions agreeing, as you describe it here, is a guarantee. @mortonjt would know best, but what I think might be happening here is that all (or at least most) of the features observed in your study are decreasing to some degree between the Vag and CS conditions.
A scenario like this is described in Fig. 2 of the paper introducing Songbird -- in the dataset shown here there are two groups of oral microbiome samples, one collected before participants brushed their teeth and one collected after participants brushed their teeth. Of course, most features' absolute abundances (verified in that paper using flow cytometry) are lower to some extent in the after-brushing samples than in the before-brushing samples, which is as expected -- and yet there are still a lot of features on both sides of 0 in log(before brushing / after brushing) + K (see Fig. 2b, copied below).
The reason is that, for that dataset, it's more about which features are decreasing less or more than other features, on average; the features with highly ranked differentials for log(before brushing / after brushing) + K are generally more associated with before-brushing samples, i.e. decreasing more between before ==> after brushing, while the features with lowly ranked differentials are generally more associated with after-brushing samples, i.e. decreasing less between before ==> after brushing. Quoting the paper:
These results are consistent with our knowledge about oral biogeography. Haemophilus is typically found on the periphery of oral biofilms and was likely removed from the biofilm during the brushing process, whereas Actinomyces is generally found on the surface of the tooth and acts as an anchor for biofilm attachment . Importantly, this experiment demonstrates the potential fallibility of relying on relative abundance; it is incorrect to conclude that Actinomyces increases after tooth brushing despite the increase in relative abundance. As demonstrated by flow cytometry, total microbial load decreases, and while both Haemophilus and Actinomyces decrease, Haemophilus decreases more.
Taking things back to your dataset, what may be happening (and again I am not 100% sure about this!) is that, while some or all Bacteroides are still decreasing between Vag and CS samples, they are just decreasing less than many other features in the dataset are. I'm not familiar enough with infant microbiome studies (I'm guessing Vag means "delivered vaginally", while CS means "delivered via C-section"?) to say if this makes sense with the specific genera you've mentioned, but at least anecdotally I think this sort of pattern may make sense to see in a study of infant microbiomes in vaginal vs. C-section births.
Hope this helps clarify things; please let me know if I misunderstood your question, and @mortonjt please let me know if I messed something up >_>