stats post-hoc ANCOM

YaroGitHubed · October 4, 2024, 12:41pm

Could you please advise on the following? I understand that ANCOM might not handle small sample sizes well in terms of controlling the false discovery rate (FDR). Would it be helpful to run post-hoc statistical tests for multiple comparisons on transformed data (alr, clr) to reconfirm the results?

I am aware of the existence of ANCOM-BC, which has superior properties. However, I am currently more familiar with the scikit-bio package, which unfortunately lacks ANCOM-BC support.

Many thanks in advance.

colinvwood · October 4, 2024, 5:46pm

Hello @YaroGitHubed,

I can't comment on your specific question because I don't have a statistics background, but I can tell you that ANCOMBC is available as an action in the composition plugin in QIIME2, in case you were unaware. I believe the ANCOMBC paper states that FDR is well controlled above sample sizes of ~10 (unsure what you're deeming "small sample sizes").

YaroGitHubed · October 7, 2024, 11:31am

thank you for your answer and the info

jwdebelius · October 7, 2024, 1:51pm

Hi @YaroGitHubed and @colinvwood,

Hopefully I can bring something stat-sy and help?

I think my recommendation here would be to run another tool, verses a post-hoc ALR or CLR because you have that weird confirmatory property without FDR. I'd recommend reviewing the recent paper by Nearing et al where they compared differential abundance on the same data set.

Personally, I love a good ALR, but I think your challenge with ALRs is picking the right reference.

Best,
Justine

YaroGitHubed · October 8, 2024, 12:31pm

Thank you for the reply, your comment, and the reference to the paper. It makes sense to compare the outcomes of several differential analysis tools to confirm discoveries. I am also wondering if you are aware of any plans to integrate ANCOM-BC/BC2 and/or other d analysis tools into scikit-bio?

Many thanks,
Yaro

jwdebelius · October 8, 2024, 4:08pm

Hi @YaroGitHubed,

You'd have to ask the scikit-bio developers. My recommendation would be to work with the qiime2 API and then extract into python, but YMMV.

Best,
Justine

colinbrislawn · October 10, 2024, 2:52am

Like this? skbio.stats.composition.dirmult_ttest — scikit-bio 0.6.3-dev documentation

This process mirrors the approach performed by the R package “ALDEx2” [1].

Yes! I do this too but, just using R instead of Python for the stats package

YaroGitHubed · October 15, 2024, 12:22pm

Thank you both. I appreciate your help.