Adonis and ANCOM with rare features

hsapers · December 15, 2020, 5:59pm

Hello - apologies in advance for the length of this question, I think things are all related so I thought it was best if I posted this together. I'm working with an experimental system that has low biomass, ASVs that are predominantly unclassified at the order level or higher (many unclassified at the Phylum level), and a severely left-tail feature distribution (most ASVs are only observed a few times).

My experimental design fits a stratified factorial within subjects model: I have 5 different deployments (at this time I am not doing a longitudinal analyses and treating these instead as 5 separate experiments), testing to see if flow rate and choice of physical substrate influence community structure. In each of the 5 experiments I have either fast or slow flow conditions and in each of those conditions I have bead or crush substrate. In each condition there are 1 - 3 replicates (the design is unbalanced in that there are different numbers of replicates in each group do to constraints of the system). I also have an initial test system: again due to constraints of the system, the sample device in the slow system is made from plastic and the sample device in the fast system is made from glass. In three of the deployments I also set up plastic and glass devices in the fast system to evaluate if the material of the sampling device influenced the community.

1:glass vs plastic - difference between PERMANOVA and univariate Adonis

edit - solved - found an errant space in the PERMANOVA command

For each deployment I did a pair-wise test between the glass and plastic containers. I thought using qiime diversity beta-group-significance would be the most straightforward. I ran the code below, but after 16 hours I finally terminated it. I then tried using qiime diversity adonis and it worked fine. I can't figure out why qiime diversity beta-group-significance didn't work as I thought they were different implementations of the same test:

PERMANOVA

qiime diversity beta-group-significance \
  --i-distance-matrix braycurtis_distance.qza \
  --m-metadata-file metadata_103020.tsv \
  --m-metadata-column container \
  --o-visualization braycurtis_container_pernova.qzv

ADONIS

qiime diversity adonis \
  --i-distance-matrix braycurtis_distance.qza \
  --m-metadata-file metadata_103020.tsv \
  --o-visualization manifold_crush_braycurtis_container_adonis.qzv \
  --p-formula container

2 ADONIS formula
The univariate adonis (I think that's what it would be?) indicated that there was not a significant difference between the Bray-Curtis dissimilarities of community compositions between container material. Great. Moving on I wanted to use Adonis to assess the effect each level of both factors (container=fast or slow; fill=bead or crush) including interaction had and if there was a significant difference in any of the six groups. I had a couple of questions about how qiime diversity adonis calculates the total number of samples in each group and if running additional stratified test was valid.

2a: number of samples calculation
Looking at the original Anderson 2001 paper in the section 'Calculating the statistic' the total number of observations are calculated by:

"Let A designate factor 1 with a levels (treatments or groups) and B designate factor 2 with b levels, with n replicates in each of the ab combinations of the two factors. The total number of observations is N = abn."

If my experimental design is unbalanced such that the number of replicate in each of the groups is different wiht N = (an)+(bn'), is Adonis still valid and does the implementation in qiime diversity adonis allow for this in calculating the total number of observations?

2b) Using a factorial design I could establish that there was no significant interaction effect between factors. I would now like to look at difference between levels of a factor with the factor themselves stratified: ie: is there a significant effect of fill (bead vs crush) in each of the two containers (fast and slow) and vice versa. Is it valid to break down a factorial design like this once it's established that interaction between factors is not significant?

3 permdisp
If Adonis is valid, then it looks like there there was a significant difference between fill and container and there was not a significant interaction in each deployment. I know that Adonis can find a significant difference if there is a significant difference in dispersion in addition to a difference in the means. So permdisp can be used to see if there is a significant difference in dispersion between groups of factors. I ran the follow that produced the following error:

qiime diversity beta-group-significance \
  --i-distance-matrix braycurtis_distance.qza \
  --m-metadata-file metadata_103020.tsv \
  --m-metadata-column fill \
  --o-visualization braycurtis_fill_permdisp.qzv \
  --p-method permdisp

Plugin error from diversity: too many indices for array: array is 1-dimensional, but 2 were indexed Debug info has been saved to /export/data1/tmp/qiime2-q2cli-err-wq7d8kxv.log

Traceback (most recent call last): File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in __call__ results = action(**arguments) File "<decorator-gen-408>", line 2, in beta_group_significance File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable output_types, provenance) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 452, in _callable_executor_ ret_val = self._callable(output_dir=temp_dir, **view_args) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_diversity/_beta/_visualizer.py", line 153, in beta_group_significa$ permutations=permutations) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_permdisp.py", line 233, in permdisp permutations) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_base.py", line 1085, in _run_monte_carlo_stats stat = test_stat_function(grouping) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_permdisp.py", line 247, in _compute_groups centroids = samples.groupby('grouping').aggregate(_config_med) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 971, in aggregate result = self._aggregate_multiple_funcs([func], _axis=self.axis) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/base.py", line 526, in _aggregate_multiple_funcs new_res = colg.aggregate(arg) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 246, in aggregate ret = self._aggregate_multiple_funcs(func) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 319, in _aggregate_multiple_fun$ results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 261, in aggregate func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1083, in _python_agg_general result, counts = self.grouper.agg_series(obj, f) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/ops.py", line 644, in agg_series return self._aggregate_series_fast(obj, func) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/ops.py", line 669, in _aggregate_series_fast result, counts = grouper.get_result() File "pandas/_libs/reduction.pyx", line 256, in pandas._libs.reduction.SeriesGrouper.get_result File "pandas/_libs/reduction.pyx", line 74, in pandas._libs.reduction._BaseGrouper._apply_to_group File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1060, in <lambda> f = lambda x: func(x, *args, **kwargs) File "/export/data1/sw/tag_conda/envs/qiime2-2020.2/lib/python3.6/site-packages/skbio/stats/distance/_permdisp.py", line 264, in _config_med X = x.values[:, :-1] IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I didn't find much in the forum for this error.

4: ANCOM
I moved forward with ANCOM to identify features that were differentially expressed between the levels of the factors that Adonis indicated had significantly different Bray-Curtis dissimilarities. I've run into a couple problems here and I think it's mostly due to the frequency distribution of my features. I'm not expecting many features to change between conditions (unless dada2 was off and my singleton and low abundance ASVs are noise, but I thought Data2 and the error models were such that low abundance ASVs could be treated as less error prone than OTUs).

When I first ran ANCOM without filtering low abundance ASVs (since I think these are real and driving the differences Adnois found) I ended up with results I was skeptical of - long lists of hundreds of ASVs that I could reject H_0 for. Since Adonis is operating on the dissimilarity matrix and ANCOM is operating on the featureTable[frequency] I know they may have different resolutions. ie ANCOM performs a CLR that requires a pseudocount to be added, and since my low abundance ASVs were likely observed at a similar magnitude as the added pseudocount, I decided filter. Filtering to include only ASVs that accounted for 0.02% of the observed features (173) and increasing that to only ASVs observed at least 10 times both failed to identify any significantly different ASVs (consistent with my hypothesis that it's the low abundance ASVs driving the difference between conditions). When I increased the filter to at least 2 observations I ended up with some results that looked as I would expect (1 - 5 ASVs identified) and other results that look like the below (where anything with W=0 or higher was identified, and a cut off of 0 seems suspect. So I had three questions I wasn't sure how to answer:

4a) W=0 seems like a suspect cut-off and that my low abundance ASVs may be difficult to resolve from a pseudocount. In the volcano plot - I thought that CLR vs W was plotted for all features in the comparison, but in at least one of my plots there are clearly only a few of features plotted.

4b) If I filter the feature table for ANCOM, should I also filter the feature table to the same level before calculating the Bray-Curtis dissimilarity used for Adonis? Or just state that ANCOM, because of the need to add a pseudocount, will have a lower resolution and may not identify all ASVs that led to the significant difference in BC dissimilarity via Adnois?

4c) Can I adjust the value of the pseudocount being added in ANCOM, or is there a better method for low abundance ASV discrimination that I should be using?

Thank you for all of the forum support!

jwdebelius · December 15, 2020, 9:17pm

Hi @hsapers,

Okay, I'm going to try and tackle this. ...I may tag in more people, but let's see how far we can get?

Does this mean that this issue is fixed? Sorry... just for clarification?

*2 ADONIS formula

I think this depends on how unbalanced unbalanced is. You get weird behavior with severely unbalanced designs (emperically and hand-wavily starting about 2:1). So, if you've got 10 in one group and 12 in the other, I would worry less than if you have 10 in one group and 2 in the other.

Two approaches that I take (whether they're right or wrong). First, I will sometimes run an adjusted model. So, like

fill + container
container + fill

The order matters in Adonis, so you need to switch the variable so the adjustment works correctly since you capture the variation int he data in the order specified by the equation. From the vegan documentation:

Function adonis is directly based on the algorithm of Anderson (2001) and performs a
sequential test of terms.
(I'm about 95% sure that QIIME wraps adonis and not adonis2.)

The second option is to stratify your data to get four distance matrices: two for fill and two for container, and then run adonis on the stratified data. But, I think that if you don't have an interaction that the adjusted approach should work.

One more thought/note here: in adonis, you might be able to get the data nested by adding a term as a slash. So, fill + container/replicate will at least account for the fact that you've not nested replicates. It's not perfect, but given your design, it might be worthwhile considering.

3. Permdisp

Your error is a python error around the metadata and less intuitive. I think you may have two columns called "fill" in your metadata. If this isn't the case, could you make a new post?

4. ANCOM

It depends on what you consider noise here. AFAIK, dada2 is slightly more prone to PCR-senitivity than unoise or deblur, but Im trying to remember the citation on that. I should be related to the error modeling. So, it's possible the singletons are noise.

The second piece is that you ran Bray-Curtis which focuses on abundant features. This tends to be less sensitive to low abundance features, so I'd expect the difference to be driven by your more abundant features.

Both of these are true... but I'm going to go in reverse order. The pseudocount of 1 tends to have a larger effect on rare features. I'm all for pre-fitlering because of power. If you have a singleton, there isn't a distribution around it and it's hard to resolve the relationship... this is going to be true for whatever differential abundance test you use.

A W of 0 means that the ratio between the two features was not significantly different for any statistical tests applied. Often, this is a problem of power: your comparisons aren't getting over the FDR threshhold. Im struggling a little bit to visualize the experiment, but at a given deployment, it sounds like you've got 2 flow rates x 2 containers x (n replicates) where n is between 1 and 3. At htis size, you've got about... 12 samples total? With some replicates so the number of features you've got is going to be higher. That's going to lead to larger p-values in a distribution based test because your sample size is smaller plus a high FDR penalty. ANCOM tends to be quite conservative and requires a relatively large sample size for differential abundance, so this may be some of the source of the problem.

I tend to treat the analyses as linked but separate and describe them differently in methods. (Especially because I also apply different normalizations.) If you're not sure, you can always rarefy the filtered data, calculate Bray Curtis and run a mantel test. My suspicion is that you'll have an R^2 > 0.9, suggesting your distance matrices are highly correlated.

You could also try Aitchinson, which is compositionally aware and so will be more closely correlated with the ANCOM results. You can find that as a part of qiime diveristy beta; that will include the psuedocount in the data.

You can adjust the psuedocount in the compositionally step, but Im not sure if it lets you add a decimal value; check the documentation? You may want to check out ANCOM 2 and the associated publications, particularly if you're an R user. At some point, hopefully we'll get it into :qiime2:, but alas, not yet.

Best,
Justine

hsapers · December 16, 2020, 4:14pm

Thanks again @jwdebelius for a great explanation -

1 - yup this is solved, I had an extra space after one of the flags that didn't raise an error - just kept things running indefinitely. Actually, the PERMANOVA output alerted me to a potential error in the metadata - and a potentially useful output from qiime diversity adonis : it is possible to extract the number in each group or to produce a table of the sample_ids for each group?

2a Would using permdisp address over sampling in one group? I'm going to put a pin in the 'permdisp' error while I comb through my metadata - but this sounds like a potential way to evaluate unbalanced experimental design?

2b Actually I think I'm more confused now. I went through the Anderson 2001 paper and actually drew out the matrices corresponding to each term from the partitioning sums of squares equation. Are the following interpretations correct:

fill + container calculate effect of levels of the factor fill regardless of the level in factor container (ie across all levels of container), and then out of the residual calculate the effect of the levels of the factor container (completely independent of the factor fill). container + fill is the inverse and why order matters here.
fill * container calculate the effect of levels of the factor fill regardless of the level in factor container, calculate the effect of the level of the factor container regardless of the level in factor fill, - and then calculate the interaction by subtracting each of these main effects and the residuals from the total.
fill strata=container calculate the effect of each level of fill within each level of container. container strata=fill is the inverse. This how I interpreted your reply here:

The second option is to stratify your data to get four distance matrices: two for fill and two for container, and then run adonis on the stratified data. But, I think that if you don’t have an interaction that the adjusted approach should work.

I'm not sure how 1. (and it's inverse) would give me the same information as 3. - I'll need to think about that for a while.

I'm also having a bit a difficulty conceptualizing how the equation in Anderson 2001 is implemented in adonis. According to the R formulae page

A*B is interpreted as A+B+A:B

where A is the main effect of factor A and B is the main effect of factor B and A:B is the interaction term.

Wouldn't that make A*B different than B*C - and I can't reconcile that with the equations on page 39-40 of the Anderson paper. Or in this case is the interpreted equation (A+B+A:B) indicating each of the calculated effects - the main effect of A, the main effect of B, and the interaction and not how each of those effects are calculated?

I'm not clear on how the slash in interpreted here:
fill + container/replicate is this the same as using fill + container for each replicate individually rather than pooling all replicates from the same condition? Wouldn't there only be one sample in each condition then?

3 I think there is something wrong with my meta data - it will take me a couple of days to comb through it. I'll make a new post if the error persists.

4 Yes - my sample size is quite low - it's a very difficult environmental system to access and it just isn't possible to increase sample size other than to keep going back and use different time points as replicates. The problem is the source community isn't stable enough to assume these are replicates. Unless I consider that a between sample comparison? If I treated each time point like a different subject in a gut microbiome experiment - it's the same source community, but variation over time leads to slight differences - perhaps analogous on how there are slight differences in the starting gut microbiomes of different subjects?

This is the basic set up (with slight variations in design between deployments because something always breaks in the field):

Slow: 2x glass substrate; 2x crush substrate (all containers plastic)
Fast: 2x glass substrate (glass container); 2x crush substrate (glass container); 2x crush substrate (plastic-lined glass container)

After indicating that there is not a significance difference between 2x crush substrate (glass container) and 2x crush substrate (plastic-lined glass container) in the fast condition, these were pooled into the same conditions such that:

Slow: 2x glass substrate; 2x crush substrate (container material irrelevant)
Fast: 2x glass substrate; 4x crush substrate (container material irrelevant)

It sounds like - for there to be a pair-wise comparison between features in different conditions than when filtering: min number of features must be > 2 and min samples must be >= min number of replicates in each condition.

I am going to try Aitchinson / Aldex and compare. Are you aware of any literature that compares ANCOM and SIMPER?

I looked at the ANCOM docs and I couldn't find a flag to specify the pseudocount - but it does look like CLR isn't the only transformation option. I might try √ and see if that helps resolve things if a pseudocount isn't added, I'm just not sure how not using a ratio would influence the interpretation of the statistic.

I was looking at the wrong docs - from add-pseudocount there is a flag to specify the value (default 1) looks like any integer value.

I'm going to see how this looks after I resolve the metadata and implement the new filtering.

Thanks!

jwdebelius · December 16, 2020, 11:13pm

Hi @hsapers,

I'm going to disrupt some of the order and jump around.

Metadata is one of the hardest parts of any microbiome project. I swear, I have multiple projects where I've spent more time and effort formatting metadata than I have actually doing analysis. Keemei may help some, it's a good first check. If you've already validated there, I would suggest moving into R or python with pandas to do a more in-depth check. I think metadata validation might be a good discussion as a separate topic.

hsapers:

Yes - my sample size is quite low - it’s a very difficult environmental system to access and it just isn’t possible to increase sample size other than to keep going back and use different time points as replicates. The problem is the source community isn’t stable enough to assume these are replicates. Unless I consider that a between sample comparison? If I treated each time point like a different subject in a gut microbiome experiment - it’s the same source community, but variation over time leads to slight differences - perhaps analogous on how there are slight differences in the starting gut microbiomes of different subjects?

This is the basic set up (with slight variations in design between deployments because something always breaks in the field):

Slow: 2x glass substrate; 2x crush substrate (all containers plastic)
Fast: 2x glass substrate (glass container); 2x crush substrate (glass container); 2x crush substrate (plastic-lined glass container)

After indicating that there is not a significance difference between 2x crush substrate (glass container) and 2x crush substrate (plastic-lined glass container) in the fast condition, these were pooled into the same conditions such that:

Slow: 2x glass substrate; 2x crush substrate (container material irrelevant)
Fast: 2x glass substrate; 4x crush substrate (container material irrelevant)

Thank you for the description! It really helps to visualize the design. Although I think one caveat to consider in your combination if you used a single time point is that your permutative p-value is bounded by the number of possible permutations and there are 4 ways to permute 4 samples, giving you 24 possible permutations, for a minimum p-value of 0.04.

As I conceptualize the design and try to find an analogy that's maybe easier to find a model for, analyzes that work with nested animal designs might help you. So, for rodents, we have to account for a cage effect because mice are coprophagic little bastards (and rats are coprophagic sweethearts). So, as you have 2-4 independent environmental systems in each group, and within each system, you have 5 nested timepoints. So, you need modeling that lets you account for repeated measurements in your system or some way to nest the data.

Permdisp has some issues with severely unbalanced designs - like 3 - 5 times or more samples in one group than the other; but it will help. Although, having seen your design, I think the unbalanced aspect is less of a problem than your number of samples.

Yes, that's correct.

I think if you want to get the deep implementation, it can be helpful to go read the code? From my practical perspective i tend to inter the equation as (A + B + A:B) (so, A + B + interaction) and move on from there. That's somewhat aligned with what you see in Figure 7 (page 40), at least based on my reading.

In R, a / indicates a nested variable. So, you can nest blocks based on your treatment. I may have mis interpreted your replicate - I was assuming technical replicates from the same site/time point verses multiple parallel systems the same timepoint. (So, I assume the replicate here was dependent verses independent.) If you have multiple nested factors (like mice in a cage), nesting can help address some of the variance there. So, this may be less relevant given more information about your design.

Coming back to 4...

At your current sample size, or my understanding of your current sample size, changing the pseudocount and transformation isn't going to solve your core problem, which is that compared to your multiple correction penalty, your sample size is too small. I can give you lots of studies where they can't detect individual features as differences (and plenty more where I dont trust the features they found because of their sample size and methodology).

If you want to take your time point into difference abundance, you might look at this recent thread about building nested models using LMEs (accounting for your temporal variation). My recent experience with this is that they also tend to be relatively low power and you may not detect anything significantly different.

I think, in general, microbiome differential abundance has swung more conservative.

Best,
Justine

hsapers · December 18, 2020, 4:52pm

@jwdebelius - thank you for this!

Keemei is great - and I think I've been spending more time on meta data curation than anything else so far... I had forgotten to pull out a couple of confounding samples - not sure if this would lead to the permdisp error - but it may have interfered with something down stream.

While both fill and container were designed to be independent variables, I wanted to make sure there was no interaction (early data suggested flow rate may influence the influence of fill) I think I want to use fill*container/deployment to assess the interaction term and main effects for each time point and then to look at each separately (fill/container)/deployment and (container/fill)/deployment. Taking a close look at LMEs - thanks for that link.

I need to look over adjusted p-values again, I was getting some pretty low values (e.g. 0.001), but you're right - they should be no lower than 0.04 - tracing this now

thanks again and happy holidays!

system · January 18, 2021, 10:52pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.