Negative F-statistic values in ANCOM output after filtering low abundant taxa

Mehrbod_Estaki · February 27, 2018, 5:37am

Hello,

I'm looking at a simple two group comparison using ANCOM and I'm having trouble understanding the x-axis of the volcano plots. I thought (from reading previous forum posts) that this was the F-statistic but I am noticing a negative value in my outcome and I'm not sure how to interpret that. I didn't realize F-statistics could ever be negative...

The other interesting component here is that this only occurs when I use the filtered version of my feature table, where we removed low abundant taxa.
Using the unfiltered table I see the same significant taxa of interest but with a strong positive F-statistic value instead.

Given this my questions are
a) Whether the filtering step was ever necessary since there are very few differences between the groups and perhaps this somehow caused the negative values by reducing much of the would-have-been passed sub-hypothesis leading to lower W-values.
b) If using the filtered table, should I report the absolute F-value or how to interpret the negative value.
c) Overall, what is the recommended information to report from the ANCOM output. The W value is rather uninformative in a sense that is specific to the test and its not really comparable across studies (please correct me if that's wrong). The clr-mean-difference as a proxy of effect size? Is there some sort of adj.pvalue of the main test somewhere as well?

Lastly, not as important, is it possible to wordwrap the hover-over summaries in the volcano plots into a few lines? As you can see from the unfiltered output from the above example, when the taxonomy string is long, the exact x,y coordinates are cut off from the plots with no other way of recovering that exact info.

Thank you!
-Bod

Nicholas_Bokulich · February 27, 2018, 5:24pm

It looks like you are getting the same results but the axis is flipped. @mortonjt is this normal?

I do not think that filtering is necessary here — ANCOM is intended to handle high-dimensional data like this — but it will increase runtime so is useful.

I'd recommend using the raw values (depending on Jamie's answer above).

You are correct, W is not comparable across studies. I would recommend reporting the clr-mean-difference and W scores for all significant features.

yes

I believe these values already account for multiple testing but maybe @mortonjt would know more.

Thanks! I have added an issue here to track this.

Thanks @Mehrbod_Estaki! I hope that helps!

mortonjt · February 27, 2018, 9:54pm

Considering the volcano plot, it looks like there isn't an F-statistic being run. When there are only 2 categories, just the clr mean difference is calculated (which is essentially a log fold change). If you have negative log fold change, that is indicative of decrease (since log(x) < 0 for 0 < x < 1), whereas a positive log fold change is indicative of increase (since log(x) > 0 for x > 1).

The filtering recommendations are done based on empirical evidence. We have seen wonky behavior when there is a lot of low abundance features. It would be great to have insights from the original authors on this though. In addition, ANCOM is very slow when the number of features is high, since it scales quadratically with the number of features, so filtering definitely helps with runtime.

Yes, there is a Holm-Boniferroni test run by default within ANCOM (see code here for details). This can be disabled at your own risk.

Mehrbod_Estaki · February 27, 2018, 10:08pm

Thank you @Nicholas_Bokulich and @mortonjt! This certainly clears up a lot of my questions and confusions.
Just to clarify... so in the examples I provided above, the reason why one test has a negative 'mean clf difference' while the other has a positive value is simply due to the order in which the two categories are presented? So in one the test suggests that abundance of taxa A is greater (+) than B, while the other test is just the same answer but saying B is lesser (-) than A?

Nicholas_Bokulich · February 28, 2018, 12:32am

Yes, that seems to be the case, given that the two volcano plots are practically mirror images. The order in which the categories are presented is perhaps decided randomly?

system · March 31, 2018, 6:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.