ANCOM using a continuous metadata column

Dot · April 3, 2021, 5:21am

Hello,

Is there any way to use ANCOM within QIIME2 to assess whether features differ in abundance based on a continuous variable (rather than a categorical variable that compares feature abundance between two or more groups)? For example, I would essentially like to use a continuous numeric variable in place of the parameter --m-metadata-column. It seems from the ANCOM paper that was referenced in one of the tutorials that fitting a model using a continuous variable is indeed possible but I'm unsure how to implement that here within QIIME2. I apologize if this is being posted in the wrong category!

Thank you!

andrewsanchez · April 7, 2021, 11:32pm

Hi, @Dot! Welcome to the forum!

Can you point to specific quotes from the paper you are referring to?

As far as QIIME 2 is concerned, you can treat a numeric metadata column as categorical by specifying the column type as categorical. Our handy dandy metadata tutorial will clarify how to do that:

https://docs.qiime2.org/2021.2/tutorials/metadata/#metadata-in-qiime-2

Disclaimer: I've never done this before, so take this advice with a grain of salt.

With this feature, you should be able to discretize your data (as they did in the ANCOM paper) by lumping your data into bins representing slices of that continuous variable. This would require that you edit your metadata to support such discretization.

For example, if you have a numeric column with possible values 1-10 and you want to discretize that into 5 categories, you could then create 5 new categorical metadata columns indicating which category the item belongs to (1-2, 3-4, 5-6, 7-8, or 9-10). You could then run ANCOM, once for each group. For reference, the Parkinson's Mice tutorial demonstrates how to run ANCOM on two different groups and compare the results.

Dot · April 11, 2021, 5:31am

Hi @andrewsanchez ! Thank you for your response!

Yes, in the legend for Figure 3 it says:

The third row provides the mean OTU relative abundance for Bacilli against categories of breast milk variable and for Clostridia against categories of ‘Days on antibiotics’. Although, as in LaRosa et al. (16), ‘Day of life’ and ‘Days on antibiotics’ were analyzed as continuous variables, for simplicity of plotting in this figure they were discretized."

They also say right before Figure 3:

For plotting purposes, we discretized days on antibiotics into four categories.

...so I assumed the ANCOM analysis was performed on continuous variables in these cases and only plotted in discretized form afterwards?

I'll try what you suggested and discretize the data I have. Thanks!

andrewsanchez · April 13, 2021, 5:46pm

Hi, @Dot ,

Under the hood, QIIME 2 uses the scikit-bio implementation of ANCOM. It might be helpful to take a look at their docs for more details about the algorithm. ANCOM needs you to tell it which groups you want to compare. If you have a continuous variable, it probably makes sense to create groups for ease of comparison.

My understanding is that discretizing a continuous variable in the context of an ANCOM analysis,does not alter the way ANCOM sees that data; it is still going to simply compare each feature with respect to the actual values of that continuous variable for each feature. Discretizing the variable is just a way of providing metadata. So discretizing your data here really does just affect the plotting, for the purposes of easily and meaningfully comparing arbitrary groupings of that continuous variable.

I hope that helps! Let me know if anything is still unclear.

Dot · April 26, 2021, 6:25pm

This was very helpful, thanks for all your help!

system · May 28, 2021, 12:26am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.