Pairwise testing of ANCOM results

Mehrbod_Estaki · January 8, 2018, 11:47am

I'm just trying out ANCOM for the first time and had a couple of questions regarding its output. My question extends a bit from a previous thread. Currently the visualization output lists features that are considered significantly different across a grouping variable, and it provides some element of test-statistic and effect size (W scores and the F-scores, respectively). As far as I can tell ANCOM doesn't provide an option for pairwise testing if your experiment has more than 2 groups (which is a bummer!). This makes the results rather hard to publish if you can't comment on the pairwise differences. So I'm trying to figure out the utility of this tool in a couple of other situations, but I wanted to know if the approaches make sense.

A) Use ANCOM as a tool to identify important features only. Once you have the features identified as "significant" by ANCOM, filter your original feature table to include only those taxa, and use the new filtered table in some other tests such as GLMs. This would allow for pairwise comparisons and inclusion of multiple variables. Would this make sense or would the non-parametric and compositional nature of ANCOM preclude this approach into something like GLMs?

B) Same start as option A, use ANCOM again to identify significant features, filter the original feature table to keep only those taxa, and then run multiple ANCOM tests across your pairwise combinations separately. The issue with this would be the adjusting for multiple testing. Is this something that can be simply done with the W values or perhaps is not needed at all?

If there are other strategies to utilize ANCOM to allow for pairwise comparisons, I'd love to hear them as well!

Nicholas_Bokulich · January 8, 2018, 6:28pm

Hi @Mehrbod_Estaki,

I will recommend approach B over approach A. Various methods/software exist for multiple test correction (e.g., you could either plug your results into R or use the python statsmodels library if you are running your analyses in a jupyter notebook), but you should probably consult a statistician on this.

You may also want to check out gneiss instead — gneiss implements various linear models for relating differences in microbial composition to multifactorial experimental designs and it sounds like this may be what you are looking for.

I hope that helps!

Mehrbod_Estaki · January 8, 2018, 7:49pm

Thanks @Nicholas_Bokulich for your suggestion, and yes I was planning on using gneiss as well, but I think there's room for both tools to be used since they ask fundamentally different questions (from how I understand it anyways). As for option B, I wanted to know if there was a simple way to adjust the W values which seem to be a proxy for p-values using q2 outputs without having to dive back into R or other sources, but that might be a question I need to ask directly to the ANCOM folks. I'll keep on poking around.

Nicholas_Bokulich · January 8, 2018, 7:52pm

Hi @Mehrbod_Estaki,

No — I definitely don't think there would be an easy way to adjust these.

I agree, this is probably something to ask the ANCOM folks — I expect they've thought this over/been asked similar questions previously.

Please share if you get a good answer from them!

Mehrbod_Estaki · January 9, 2018, 9:30pm

I was able to get in touch with Dr. Mandal from the original ANCOM paper and he kindly referred me to an updated R package/tutorial on his website. This version has the capability of dealing with covariates and longitudinal designs and it also adjusts the W values internally. He also mentions that there are plans on incorporating these new capabilities into QIIME2 sometime in the future.