ANCOM: 'low W taxa identified as significant' issue's workaround, ANCOM2 code/instructions

Mehrbod_Estaki · September 14, 2018, 11:36pm

Hi folks,
Some users (including myself) have noticed that in some cases ANCOM behaves oddly in that it identifies very low/0 W valued taxa as differentially abundant. See here and here for example of this behavior.
In contacting one of the developer's of ANCOM regarding this issue ,Dr. Peddada had the following explanation/recommendation.

A potential reason for the problem is that the earlier code was trying to empirically derive the threshold for significance. When W takes on small values across almost all ~~samples~~taxa, the threshold calculation does not work well. For this reason, we have modified the threshold calculation by providing fixed thresholds (.6, .7, .8). We typically recommend .7 (all our simulations are based on .7). The user can see how different the results between .6, .7, .8 (which are printed side by side). Higher the threshold the more conservative ANCOM would become and lower the threshold the more aggressive ANCOM would become. Unless your data are very peculiar, the results would be different but not dramatically different.

Of course, the above update refers to ANCOM2 which is currently available as an R code
only and does not reflect the current q2-ancom plugin. Though, I am told an updated q2 adaptation is in the works, so stay tuned for that.
In the meantime, please be wiry that in these situations with very low/zero W values the taxa are actually not significant.
ANCOM2 can also handle covariates and longitudinal data which are extremely useful. The most updated code and manual are attached here ancom2.zip (110.1 KB).

Yuhong · August 12, 2019, 12:50pm

Hi Mehrbod,

Thank you for the useful information. Can you explain more about the fixed thresholds? To my understanding the threshold is to avoid the taxa that are with low W value but considered significantly different in abundance between groups. But how is the threshold calculated? Thank you!

Yuhong

Mehrbod_Estaki · August 12, 2019, 6:21pm

Hi @Yuhong,
That's a great question. I actually don't really know the answer to that and wouldn't dare to speculate as to not spread false information. Your best bet is to ask one of the authors on the paper directly (they are not on Qiime2 forum as far as I am aware). Or maybe someone closer to the project can comment here.
Sorry couldn't be more help!

Yuhong · August 12, 2019, 11:55pm

Hi @Mehrbod_Estaki,

Thank you for the reply. No worries, I just picked out the code to calculate the threshold in the R script. I will give a try to see if I can figure it out

Yuhong

Mehrbod_Estaki · August 13, 2019, 12:19am

Good idea @Yuhong. Would love for you to update us if you do figure it out, would be good knowing the answer to that.

Yuhong · August 13, 2019, 12:21am

Sure @Mehrbod_Estaki. Try my best

olar785 · January 11, 2020, 2:33pm

Thanks Mehrbod for sharing the link to ancom2 R version. I would just like to point out that if one has ASV/OTU names starting with a number, the function will return an error because during the creation of the 'data_comp' table, the column names containing the feature ids will be changed (an X is added if the name start with a number). This is simply fixed by adding check.names = F as below

if(repeated==F){
data_comp=data.frame(merge(otu_data,var_data,by="Sample.ID",all.y=T),row.names=NULL, check.names = F)
}else if(repeated==T){
data_comp=data.frame(merge(otu_data,var_data,by="Sample.ID"),row.names=NULL, check.names = F)
}

Just thought I would mention it to save time for others
Best,
Olivier

jwdebelius · January 11, 2020, 5:52pm

Hey, so, I'm kinda late to the party, I but I use a fixed threshhold for my ANCOM. (I'm a bit more conservative because I like 0.8). Typically, in ANCOM, your W value is calculated as the number of tests that are significant and then the distribution of numbers is used to calculate a threshhold based on the assumption that a distribution is bimodal.

However, you can also set significance a priori and say "X% of my tests must be different for significance". (You could always do it manually, but its nice to see ANCOM 2 has it built in.) So, if you set a hard threshold of 0.7, then a feature is significant if 70% of the ratios are significant. If you set a hard threshhold of 0.9, then 90% of features must be significantly different for significance etc.

You can get the threshhold on the current plug in (but not the shiny visualization behavior) by dividing W you get by 1 - the number of features tested (since we do a comparison for every feature except that 1). And, then, if W_norm ≥ threshhold, you call it significant.

I actually prefer this method, which I've been doing for a while, because it feels more like setting a p-value for an assumed distribution, where the p-value of 0.05 says 95% of my data should be less extreme than my value (or there's a 1/20 chance Im wrong), I feel like setting my threshhold at 0.8 means my ASV is changing wtih 80% of my data and that just give me more confidence.

...I suppose you could also do a joint distribution, where you take the max of the threshold and the bimodal distribution, but I've never quite gotten there. (Maybe thats what the R code does, IDK.)

Let me know if you want help with this part. It would nice to not make my students do the calculation by hand!

Best,
Justine

thermokarst · January 20, 2021, 8:08pm

A post was merged into an existing topic: ANCOM - W values are high but null hypothesis was still rejected