adonis, betadiver, and betadisp

(Devon O'rourke) #1

An earlier post by @CarlyRae first introduced me to the fact that when examining beta diversity I need to consider not only the results of a multi or univariate anova, but also examining if the within-group distances to a centroid were equivalent among my groups. I’m wondering how to best interpret the situation where everything turns out to be significant. That is, if the results of both PERMANOVA and PERMDISP suggest significant differences, where do most people go next? I’ve run these tests both in QIIME and separately with vegan in an R environment; while QIIME’s visualizations show barplots, Carly’s suggested ordinating these data.

In that spirit, I was futzing around in R this morning and came up with this:

The points represent samples, the colors represents the groups of samples they belong to. This led me to a few questions:

  1. Pardon my complete ignorance, but could I have generated this ordination in QIIME anyway? I wasn’t sure if the input in running the vegan betadisp() function generates an output that is similar to what is created with qiime diversity pcoa + qiime emperor. My confusion here is that the input for calculating dispersions in Vegan is a distance matrix, just like the input for the PCoA plot in QIIME. However, the input for the plot I made was the output of the betadisp() function, whereas the input for QIIME’s PERMDISP is a distance matrix. Perhaps I need to deconstruct the resulting .qzv artifact following PERMDISP and feed that into the pcoa function?

  2. Regarding the above plot, I was worried that the proportion of variance explained by the two eigenvectors was quite low, and wondered how researchers use this information in their interpretations of the figure itself. If the values were something along the lines of 55% and 25% respectively, how would that change your interpretation compared to my existing values of 6% and 5%?

  3. Given the above plot, it strikes me that my samples are more associated by the month they were collected than the site they were collected, though both appear to matter. This was indicated as much in the Adonis test. What would be extremely helpful to me is gaining some insights about how researchers pool these collective bits of information into a final assessment. Given that I have heterogeneous dispersions across groups and am violating an assumption of the Adonis test, how do the resulting barplots (from QIIME’s viz output) or this ordination assist in making an interpretation of how grouping factors explain diversity? Is it sufficient to inform readers that these data provide evidence for differences between groups, but that those differences can be explained by both inter-group variation in distances to centroids, and by differences between groups? I’m guessing there are always limitations to what one can infer - I’d greatly appreciate others experiences with what they were confident in affirming in their work when both PERMANOVA and PERMDIST were significant.

Last bit: I saw @jwdebelius was thinking about this topic in a recent post@Nicholas_Bokulich- does the permdisp function work for your QIIME Adonis plugin as well as beta diversity?


(Justine) #2

HI @devonorourke,

Thanks for tagging me in this discussion! This is an area I’m getting into and getting excited about; the referenced post was awesome.

I think you’re looking for two separate functionalities: statistical test and visualization. I think they’re closely linked, but slightly different things.

As far as I understand, permadisp is the statistical test that measures whether the dispersion of a group of samples in ordination space is different between groups (essentially a measure of variance). I usually use the scikit-bio implementation, but the result is essentially the same. The permadisp function is statisitcal test that gives a psuedo-F and permutative p-value for whether or not there’s dispersion between the two groups or more groups.

Visualization is a second issue and is maybe more subjective, but that’s the plot. QIIME (again AFAIK, Im not a developer and do most of my own analysis in python because I like control) doesn’t have a way to visually show the centroids for clusters. Personally, I think I’d rather see the centroid as a cloud with a variation limit rather than the center point (like transparent, or something).

I think the steps to getting there for an initial prototype might be the following:

  1. Calculate a median position along each PC using something like beta rarefaction
  2. Somehow merge the median PCoA with the main PCoA using something like bi-plots
  3. Coerce the shiny new biplots into emperor
  4. ???
  5. Profit

My list of steps is probably a vast oversimplification and I’m missing something obvious, but I think that would be my first pass to try it. Probably in the underlying python architecture. Although @yoshiki can probably give limitations of emperor better than I can.

I don’t have the bandwidth right now to play around this. Maybe over the summer when everyone here goes off on vacation :airplane: :palm_tree:, Ive got a confusing number of hours of daylight, and theoretically more free time. (Although that feels like a dangerous promise.)


(Nicholas Bokulich) #3

Just want to chime in to answer some specific questions tossed my way:

The ordination yes, the centroids no.

It is very low. A small % of variation is explained by the first 2 PCs! Not necessarily a problem, just that season effects (while the effects evident in that plot) are not earth-shatteringly impactful. Makes sense — while you get species turnover by season, many of the same species will be present across proximal seasons.

Yes. Both permdisp and betadisper accept a single “grouping” factor so you just need to test each factor individually if you are comparing to, e.g., a multi-factor adonis test. I think the original paper on the permdisp method explains why multi-factor permdisp is not possible but cannot recall the reason right now…

1 Like
(Devon O'rourke) #4

Great, thanks @jwdebelius and @Nicholas_Bokulich for the replies!

I’m still left wondering what to make of results where permdisp and permanova are both significant. My understanding is that:

  1. With PERMANOVA I’m the null that there are no differences in the group centroids. If I get a statistically significant output, then I’m led to believe that there are differences between these groups. These differences could be because the community composition of the species in groups are distinct(ish).
  2. The centroid differences could also be different because of inter-group variance, so we run PERMDISP to account for this. If we get a significant result here, then we’ve identified a situation where the dispersion (variance) within a group is distinct from (an)other group(s).

What I’m hoping to help clarify is what to then infer. Is this kind of like an interaction effect in an ANOVA, in that your main effects are conditionally dependent? In the case of interpreting the beta diversity output, is it fair to say that if PERMDISP and PERMANOVA are both significant, then you’re finding that there are group differences, but you can’t conclude why those differences exist?

Would there be an appropriate post hoc test to run to further tease apart what group/factor might be driving the significant PERMDISP result? When I run the Anova test in R directly with betadisper I can see what the average distance to the median is, and it’s hard to convince myself that these differences are super meaningful…

For example, if I examine the dispersion output for both Site and Month:

Average distance to median for SITE:
    EN     HB 
0.6117 0.6341 

Average distance to median:
  July      June September 
0.6110    0.5827    0.6179 

What I find is that yes, there are apparently statistically significant differences, but those are pretty small, and that’s what you see in the plot I posted in this thread initially. And it follows a trend I’d expect; you get less dispersion in June because there are fewer insects out in the environment then (so the variation should be less).

@CarlyRae suggested in her post that a TukeyHSD test might work, and if I run an HSD test for both Site and Month, then I find that (1) there are differences between Month (in this case, it’s June:July and June:September that are significant) as well as with Sites (but because there are only two sites, there aren’t multiple pairwise differences to be investigated). My takeaway from this is that June is really what is driving the dispersion difference - is that sensible enough?

For what it’s worth, in reading through Marti Anderson’s recent paper outlining both of these concepts, it seems like ordinating both the main and interaction effects is advised, but I wasn’t clear how adding centroid distances was useful.

Thanks again!

(Carly Muletz Wolz) #5

If PERMDISP and PERMANOVA are both significant and your sample size is not balanced then you can’t tell if the PERMANOVA is significant because of centroid or dispersion. But, if your sample size is balanced then a significant PERMDISP and PERMANOVA indicate the both the centroids are different for at least one pairwise comparison and your dispersion is different for at least one pairwise comparison. See “PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing?” MARTI J. ANDERSON AND DANIEL C. I. WALSH.

Your plot looks like you have a balanced design so in your case you might have an easier answer. Yes, both are truly significant. I have conducted analyses with unbalanced designs with significant PERMANOVAs and PERMDISPs, so I always plot with 95% confidences ellipses. If the confidence ellipses don’t overlap then that indicates that the centroids can be considered statistically different. If they show minimal overlap, I also generally consider the centroids to differ, but this is more subjective.
Here is some R code for that.

jacc <- phyloseq::distance(ratData, "jaccard", binary = T)
jacc.ord <- ordinate(ratData, method = "PCoA", jacc)

p_jacc <- plot_ordination(ratData, jacc.ord, color = "Sample_type", shape = "Sample_type")
p_jacc + geom_point(size = 4)  +theme_bw() +theme_classic()+ 
  theme(text = element_text(size = 20))+ stat_ellipse(aes(group = Sample_type)) +
  labs("Sample type") + scale_color_manual(values = c("chocolate4", "darksalmon"), name = "Sample type") + scale_shape_manual(values = c(17, 19), name = "Sample type")

Or you could plot from PERMDISP (different dataset)

groups_nest <- df_nest[["Nest_location"]]
jacc_dispnest <-betadisper(jaccard_nest, groups_nest, type=c("median"))

(Carly Muletz Wolz) #6

Sorry, I forgot to mention that vegan package folks added this warning in the last couple years to betadisper function in R see ?betadisper


Stewart Schultz noticed that the permutation test for type="centroid" had the wrong type I error and was anti-conservative. As such, the default for type has been changed to "median" , which uses the spatial median as the group centroid. Tests suggests that the permutation test for this type of analysis gives the correct error rates.

Using type = median is more appropriate now. I haven’t read up on this yet though. But, I took their warning.

1 Like
(Devon O'rourke) #7

Thanks for the illustrations and response @CarlyRae!
In future work creating plots of poop, you might consider the color palette named in an xkcd survey (yes, you bet there’s a baby poop green color…)

Just to confirm what you were saying about overlapping confidence intervals:

  • With the first plot (beta_Ratsamples) if group centroids overlap based on 95% confidence intervals: when they (mostly) fail to overlap the group centroids are likely different.
  • In the case of the second plot (jacc_dispnest) you’re plotting the convex hull, not the ellipse, correct? I’m guessing by code that you’re using the base R plot() call tied in with vegan, which generates the hulls by default. Is there any reason why you prefer to do it this way? I’m guessing you could also have amended the plot to include both the hull and ellipse data with:
plot(jacc_dispnest, hull = T, ellipse = T) 

What I’m curious about is whether you like using the same overlapping characteristics as a visual check for both the dispersions and group centroids.