ANCOM parameters and interpreting output
I have read the ANCOM paper and several ANCOM posts here, but the posts are either from 2017 or don't answer my question or both. I'm running qiime2-2021.2 through conda. I'm analyzing the 16S MiSeq amplicon data from microbiomes in amphibian eggs in three ponds (the ponds are named BP, CM, and FP), and I wish to see if I have differentially abundant taxa between ponds.
I ran two commands to compare the outputs: first, qiime composition ancom --i-table pseudo-label-table.qza --m-metadata-file metadata_2020.tsv --m-metadata-column pond --o-visualization ancom-pond-twentytwentydada.qzv
. Second, the exact same command but with the visualization tweaked to a unique name and with the added parameter of --p-difference-function f_statistic
.
The results look identical to me (screenshot of the former example, and I've attached them both):
ancom-pond-twentytwentydada.qzv (486.9 KB) ancom-pond-twentytwentydada-f.qzv (486.9 KB)
I see from my tinkering combined with How to interpret ANCOM results - #3 by mortonjt that the x-axis has changed from F-score to determined by my choice of option for --p-transform-function
to begin with.
First, is the advice of that post still sound given that the x-axis is no longer labeled F-score? Is it true that the upper right corner is "better" because a taxon is different from more taxa the higher it is on the y-axis, and more strongly different the farther right it is on the x-axis?
Second, why is it that the addition of --p-difference-function f_statistic
changed nothing? My documentation notes that parameter as optional and gives no default. (Running it with that parameter and the other option, mean_difference, gave me "Plugin error from composition: <lambda>() takes 2 positional arguments but 3 were given", which is clearly a whole other topic.) Is this sameness imposed by me (GIGO), or indicating some true similarity between different metrics?
Third, do I correctly apply Specify W cutoff for anacom? - #10 by mortonjt in making the following statements about my results? (a) "Two taxa are differentially abundant across the community of the three ponds, and those taxa are Rhodoferax and '0319-6G20'." (b) "Rhodoferax (clr: 41) is much more different from the rest of the community than '0319-6G20' (clr: 14), but I can't say how much different because I'm not strong at manipulating logarithms of geometric means."
Fourth, it seems to me that (if I read Specify W cutoff for anacom? - #10 by mortonjt correctly) one of these two statements must be true: (c) "The two taxa, with their W-scores of 16, are each different from the same known set of 16 comparator-taxa." OR ( d) "The two taxa, with their W-scores of 16, are each different from 16 comparator-taxa, but I don't know which 16, or it could be a different set of 16 for each of the significant taxa." Which one is it? Is there a way to get under the hood of this W-score? Or is my understanding wrong?
Sorry for this wall of text on an already much-queried topic, but I've done my best to troubleshoot and think clearly, and I still need to nail some things down. Thanks very much to anyone who can answer anything!
Edit to @ @mortonjt, because I suspect it's going to happen eventually anyway and you seem to be an authority around here about this. Sorry to bother.