ANCOM parameters and interpreting output

wburgess · June 21, 2021, 4:46pm

ANCOM parameters and interpreting output

I have read the ANCOM paper and several ANCOM posts here, but the posts are either from 2017 or don't answer my question or both. I'm running qiime2-2021.2 through conda. I'm analyzing the 16S MiSeq amplicon data from microbiomes in amphibian eggs in three ponds (the ponds are named BP, CM, and FP), and I wish to see if I have differentially abundant taxa between ponds.

I ran two commands to compare the outputs: first, qiime composition ancom --i-table pseudo-label-table.qza --m-metadata-file metadata_2020.tsv --m-metadata-column pond --o-visualization ancom-pond-twentytwentydada.qzv. Second, the exact same command but with the visualization tweaked to a unique name and with the added parameter of --p-difference-function f_statistic.

The results look identical to me (screenshot of the former example, and I've attached them both):

ancom-pond-twentytwentydada.qzv (486.9 KB) ancom-pond-twentytwentydada-f.qzv (486.9 KB)

I see from my tinkering combined with How to interpret ANCOM results - #3 by mortonjt that the x-axis has changed from F-score to determined by my choice of option for --p-transform-function to begin with.

First, is the advice of that post still sound given that the x-axis is no longer labeled F-score? Is it true that the upper right corner is "better" because a taxon is different from more taxa the higher it is on the y-axis, and more strongly different the farther right it is on the x-axis?

Second, why is it that the addition of --p-difference-function f_statistic changed nothing? My documentation notes that parameter as optional and gives no default. (Running it with that parameter and the other option, mean_difference, gave me "Plugin error from composition: <lambda>() takes 2 positional arguments but 3 were given", which is clearly a whole other topic.) Is this sameness imposed by me (GIGO), or indicating some true similarity between different metrics?

Third, do I correctly apply Specify W cutoff for anacom? - #10 by mortonjt in making the following statements about my results? (a) "Two taxa are differentially abundant across the community of the three ponds, and those taxa are Rhodoferax and '0319-6G20'." (b) "Rhodoferax (clr: 41) is much more different from the rest of the community than '0319-6G20' (clr: 14), but I can't say how much different because I'm not strong at manipulating logarithms of geometric means."

Fourth, it seems to me that (if I read Specify W cutoff for anacom? - #10 by mortonjt correctly) one of these two statements must be true: (c) "The two taxa, with their W-scores of 16, are each different from the same known set of 16 comparator-taxa." OR ( d) "The two taxa, with their W-scores of 16, are each different from 16 comparator-taxa, but I don't know which 16, or it could be a different set of 16 for each of the significant taxa." Which one is it? Is there a way to get under the hood of this W-score? Or is my understanding wrong?

Sorry for this wall of text on an already much-queried topic, but I've done my best to troubleshoot and think clearly, and I still need to nail some things down. Thanks very much to anyone who can answer anything!
Edit to @ @mortonjt, because I suspect it's going to happen eventually anyway and you seem to be an authority around here about this. Sorry to bother.

mortonjt · June 30, 2021, 3:39pm

Yes, it is the F-score, but computed on CLR values
I don't completely understand this question (providing your commands would help). But your error suggests that you are trying to run a t-test with more than 2 classes, which can't be done -- so that error is performing as expected.
Welcome to the club , no one can set reasonable thresholds without strong assumptions (it actually isn't possible to do without strong assumptions). I'd just stick with the defaults and run with it.
ANCOM won't tell you which pairs of taxa are statistically significant -- you'd need to roll out your own script to do that.

wburgess · June 30, 2021, 7:08pm

Thanks for getting back to me @mortonjt! I follow you on points 1 and 4.

For point 2, I was saying that I think the documentation for qiime composition ancom needs changing, because it seems in practice that the --p-difference-function parameter has a default of f_statistic---but running qiime composition ancom --help says, in my qiime2-2021.2, that --p-difference-function is optional and has no default. You were right that I was asking for a t-test on more than two classes, but that was just clumsiness on my part and distracted from the point I was trying to make.

I can't say I understand how my third question and your third answer are connected; I was trying to find a minimal statement so I could test if I was correctly interpreting the ANCOM results. And I was working with the defaults there. I'm not sure what assumptions underlie, e.g., "Two taxa are differentially abundant across the community of the three ponds, and those taxa are Rhodoferax and ‘0319-6G20’", so if you could go a little deeper on that, I'd be very grateful. When I defend my thesis, I know for sure who in my department is going to ask me "what are the assumptions of this statistic?"!

Thanks again for the help!

mortonjt · June 30, 2021, 7:23pm

Fair enough
Sorry that my answer didn't address your question -- I am not going to pretend that I completely understand ANCOM myself. Feel free to reach out to the original authors.

wburgess · June 30, 2021, 7:36pm

No worries! Grateful for the help you did give!

system · August 1, 2021, 1:38am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.