Standard vs compositional approach for abundance

salias · July 11, 2024, 10:57am

Hi!

I'm doing some literature research in order to see how recent papers present differential abundance results (so I can correctly write mine). As far as I'm concerned, since we are dealing with compositional data we should use methods that account for that (Differential ranking, ANCOM-BC, etc), and we should avoid comparing relative abundances directly.

The thing is that I've read recently some papers on metataxonomy and I'm finding a lot of direct comparisons of relative frequencies. For instance, a paper published in late May 2024:

Click for plot

My question here is: am I missing something? Is there some value in statistically comparing relative abundances that I am not taking into account?

Thanks in advance,

Sergio

jwdebelius · July 11, 2024, 1:43pm

Hi @salias,

I've got a and I'm going to try to be philosophical. Somewhere I have a slide about the "major shake ups" in differential abundance testing since like 2005, but there's still an information and publication lag. My thoughts here are going to be work exactly what you pay for it, so, take it with a grain of salt.

So, I agree with you that we should be using compositional methods for differential abundance, rather than working on relative abundance, rarefied, or other data in most cases^1,2 I think papers by Lin and Pedadda and Nearing and colleagues sort of support that conjecture. Even within the realm of "compositional methods" there are a fair number of options.

However, using compositional tools require (1) knowing they exist, (2) being able to implement them, and (3) being able to interpret them. The first can be a suprisingly big hurddle, but microbiome reserach is pretty interdisciplinary There are few oppertunities for formal training, including at a lot of big institutions (mine included!) and so people are sort of left on their own figuring out how analyze the data. Even if they know the tool exists, the group may not be able to get it installed or run it.
And so, if you're not sure what to do, the analytical corollary to Occam's razor³ says that you should fall back on what you know.

I would argue that's the point where peer review should step in, because Hanlan's razor⁴ should be applied to peer review. But, the problem of there not being a ton of capacity in a field that moves relatively quickly for microbiome is true in peer review, too. Often the best you can get as a peer reviewer is to ask that people add a compositional analysis, which gets bumped to the suppelement because "stupid reviewer 2 made me do it".

And so, stuff gets perpetuated int he literaure through a cycle and bad behavior gets justified because someone got away with it recently. And we keep dealing with crap, and not just samples.

On that delightful note, I'm going to go bang my head against some analysis I cant make sense of where I "can't just" do what my colleagues from other fields do!⁵

Best,
Justine

Footnotes and times It Depends

¹I'm a proponent of defining "present" based on a relative abundance threshhold in some cases. Its essentially dichotimizing a probability: we assume an organism is "present" if its in 1/000 reads. I usually tie this to a rarefaction depth or the shallowest sample, FWIW. But, that's a different differential abundance question than the one you're asking.

²A recent paper showed that relative abundance may be better for random forest. See Yerke, Brunit and Fodor, 2024

³Occam's razor: the simplest explanation is usually the best

⁴ Hanlon's razor: Never assume malice when ignorance is equally likely.

⁵Send tea and stickers!

salias · July 12, 2024, 9:09am

Hello @jwdebelius ,

Wow, that's a ton of useful information. Thanks!

Indeed. I also saw that point in Lin and Pedadda's ANCOM-BC paper, as well as in this fungal metataxonomics review. I also found this review very useful.

So the problem is that the field moves so fast that, even in peer review, there is not enough capacity to keep up. Thus the responsability to do a good analysis regardless of what we see published is ours (I think this can really be applied to science in general).

Thank you for sharing your thoughts. Knowing that publication lag is particularly present in this field, I can confidently act accordingly and discard in my literature review papers that in my judgment are not analyzing the data as I believe they should be.

It is wild how much I am learning in this forum. I hope it will help my future analyses/articles to be scientifically rigorous.

Cheers!

Sergio

--

Side note: your footnote 1 reminded me that I recently came across this paper, and it is being very useful for me to understand how some methodological concepts are applied

jwdebelius · July 12, 2024, 2:15pm

Hi @salias,

I'm glad to be semi helpful!

That fungal review is super interesting! Thank you. I love Microbiome datasetes are compositonal! I want temporary tattoos with the DOI to had out at conferences as swag.

There are so many ways to analyze microbiome/mycobiome data that you have to pick a way that works for you, and go with it becuase otherwise you end up in 80 analysis pergatory and keep looking until you find something that supports your hypothesis and thats not great.

Best,
Justine