Comparing beta diversity between samples (not groups)

It’s clear to me that qiime diversity beta-group-significance allows me to evaluate whether or not there are differences within a specified metadata factor among a set of distances.

For instance, if I wanted to test whether or not the community composition changes from one Site to another, I could do such a thing provided I have multiple samples at each Site.

But, what if I didn’t have more than one sample per Site? QIIME reminds me that’s I can’t run a test when the metadata file has just one sample per group…

All values in the grouping vector are unique. This method cannot operate on a grouping vector with only unique values (e.g., there are no 'within' distances because each group of objects contains only a single object).

… because without multiple samples within the same group, how do you measure the variance?

I thought perhaps QIIME’s longitudinal tutorial on pairwise distance comparisons might be a solution, but again the issue is that I don’t have multiple samples within a group, as QIIME gently reminds me:

Need at least two groups in stats.kruskal()

As @thermokarst clarified in a previous thread, these tests are looking at comparisons across groups, so apparently this isn’t what I need. Within the same post it was suggested that a comparison across samples might be ANCOM…

Okay, so I lied a little in the title here.
My goal is actually two fold.

  1. I do want to compare if distances are significantly different across my samples (even though I only have one sample at each Site.
  2. I also want to measure whether or not these differences are significant depending on what distance metric I use. Specifically I’m interested in running whatever tests are valid on a pair of unique distance vectors: one using a Bray-Curtis measure, and another using the Jaccard measure.

I’m worried that ANCOM is built for relative abundance datasets specifically, so any difference I see between my Bray vs. Jaccard distances is because of ANCOM’s assumptions, not because of the values themselves.

There has to be a simpler way of just comparing distances across samples that I’m cluelessly missing?

Thanks for any thoughts you can offer!

Hi @devonorourke,

I want to address your goals. As you’ve described them, I don’t think they’re possible.

There is no statistical test that I can come up with that will let you compare categorical data when you only have one sample per category. The test - regardless of the type of test - relies on a distribution of data. And, the only way to characterise this distribution is to have multiple samples. (You can look at continuous variables because they’re essentially sampling a distribution by themselves, and so unique values are fine.) This isn’t a QIIME-specific limitation, this is a stats thing.

I also worry that you might be missing a concept with statistical tests.

  • Permanova, mantel and bioenv look at differences associated with distances (beta diversity).
  • Kruskal Wallis looks at alpha-diversity related differences
  • ANCOM works on feature-based differences

So, ANCOM can help you determine if there are individual features that describe the differences between your two groups. ANCOM cannot help you determine if there’s a difference in the distances themselves. Does that make sense?

Luckily, though, you may have another solution. If you think there might be features of the sites you could try a couple of things:

  • Run your favorite geographical distance metric and then use a mantel test to compare with the continuous variable. (Although TBH, you might need to take this into python or R, Im approaching this from a more theoretical perspective ATM). This will tell you whether closer samples look more similar. It can be used with any pair of distance metrics you want to try.
  • Pick individual features of your metadata and then compare the data on that basis. For instance, are sites with higher elevation similar?
  • Use a broader classification of site (i.e. country, etc).

Hope that helps!



Hi @jwdebelius,

To be clear, the situation I actually have is not as simple as the one I wrote about, but I wanted to give an extreme case in hopes that I could better understand the assumptions of the test operate. Whether I look up the formula with Wikipedia, or in some classic text like like McCune and Grace's Analysis of Ecological Communities, somehow I can't ever figure out what is valid.

Your responses make those assumptions clearer, so thank you very much. It also points to the value of this forum and users like yourself that take their time to provide detailed responses with approachable language. Again, thank you.

I take it this means that, as with Permanova and Kruskal Wallis, ANCOM still needs a distribution of samples within (and across) groups, making my trivial example of comparing one sample per Site inadmissible?

Right. If one thing I want to do is evaluate if the distance matricies themselves are different, running a Mentel test makes sense, and that's what I ended up doing last night. Unfortunately, my goal is slightly more complicated than what I described in this post. I'm also considering factors like filtering algorithm (ex. vsearch vs. Dada2), minimum read filtering, rarefying, etc. But I do now have 630 pairwise comparisons to make use of...

which leads me to another question:

The default correlation is Spearman, which provides a p-value and rho. For ecological data like this, I'm wondering what is more commonly reported when thinking about comparing a correlation statistic in OTU tables. I'm guessing QIIME uses this as default because we rarely expect the elements of the OTU matrix to be normally distributed, so the ranking nature of the Spearman is preferred? But perhaps there are good arguments for using Pearson's method that I am yet again misunderstanding...

Ultimately this is the route I'm likely going down, but as you're describing the nature of these tradeoffs it makes me wonder if I should revise my definition of sampling unit... My real dataset consists of:

  1. About 15 Sites where samples were collected.
  2. Samples collected at each location across (up to) 30 different Weeks. One batch of samples collected per week.
  3. One batch consists of ~ 10 pieces of bat guano at some roost. Because these specific samples were collected from a colony of prodigious poopers, you can't determine which bat produced which pellet, but you do know that the bats are all the same species.

What I've been doing thus far is aggregating all samples collected in a common Site + Week, but once I do that, I no longer have multiple samples to compare variance within a group, making these Permanova tests impossible.

I wonder if I need to go back and evaluate if that was a poor decision; I aggregated data because I was mostly interested in how community composition of bat diets shift across space and time, and thought that there would be too much noise in individual samples within a given Site + Week to get any real signal.

Thanks again for your insights!

Hey @devonorourke,

It’s always more complicated, isn’t it? At least in your interaction example, there are some options!

Yeah, it means that ANCOM won’t work on a single observation from the site. But, also, that ANCOM doesn’t look at distance matrices. ANCOM looks a features, and it relates to what you see in a weighted metric, but it isn’t a test for distance.

Im maybe not fully understanding the goal of this, but it seems complex.

This seems like a complicated analysis for bench marking. Have you checked out some of the existing literature on the topic? I know that a lot of people have looked at the comparisons between different denoising/ASV picking platforms (I saw an independent comparison not that long ago, but can’t find the paper… :frowning_woman:). Each of these problems are almost independent questions and analyses. My best advice (and the advice I generally follow) is to pick something reasonable and stick with it. I don’t know if its avaliable somewhere, but @yoshiki wrote a really nice review of distance matrices a couple of years ago that he might be able to share that may address some of your questions as well.

I feel like spearman makes more sense for your mantel correlation statistic, because you’re making fewer a priori assumptions of your data (although in my experience, distances are often asymptotically normal). The big caveat here is that you need to be using a permutative test, because distances aren’t independent. I wouldn’t recommend running this directly on your OTU table, though. A distance matrix (or even a between-replicate distance) is going to be far more informative than trying to build a correlation between the OTU table.

You can potentially do a permanova for site, nad then for week. Or, you could look at dynamics by site, as long as you then have similar characteristics for multiple data aggregation tests, like mixed models.

On the one hand, I agree that you’re going to have a lot of noise. On the other, you’re going to struggle to quantify the noise without multiple measurements at each timepoint. Microbiome data is, unfortunately, inheriently noisy. Because of its sparseness, it gets noisier in cross sectional studies rather than less. (As an example: I explained 1% of the variation in my data with 500 samples last week and was extatic, because I felt like I had strong explainatory power. I work with a bunch of epidemologists who were horrifed by this suggestion.) However, the multiple measurements mean that at least you’re better understanding that variation and noise sample to sample, and that even if its a noisy distribution, you have a distribution to compare.
And, who knows, maybe that variation (real instability or noise, it can be hard to tell) might be a signal of its own?

So, yeah, I think if it were me, and I had a freezer full of guano samples that I could re-sequence, I would go back and re-do all your samples individually. Its probably the best solution in the long run. Sorry if thats not the answer you were hoping for. :confused:


I do not think the mantel test is appropriate here, given your complex design. permanova may be better; you would not be asking “does my distance matrix look different when I use a different denoising/clustering method”, but rather “does my distance matrix partition based on denoising/clustering method” which is probably a more straightforward and appropriate answer to your overall question: does denoising impact beta diversity.

The N=1 per site per time point does complicate things, but effectively site becomes a measure of “individual”, e.g., in a multi-way permanova. If your question is “are these sites different?”, you are sunk (unless if you have multiple time points per site). But since your question is “are these methods different?” You can run a multiway permanova with the formula site + week + method (or drop the week term if there is not significant temporal variation) for the purposes of benchmarking.

As @jwdebelius pointed out, there is no method that will allow you to determine whether individual samples are significantly different, unless if you sample them multiple times. You can aggregate your temporal samples (or bin them into groups of sequential weeks) to test for differences between site. But you cannot say “site X and week Y” is significantly different from other sites/weeks if you have N=1 at that site/week. Unless if temporal variation is high, I recommend aggregating (drop the week term)/binning to test for site-specific differences.


Really appreciate comments from both @Nicholas_Bokulich and @jwdebelius.
The silly part here is that I do have multiple samples, but I didn’t understand whether certain tests were appropriate/admissible for if I had only a single sample for some factor. Thanks for clearing it all up.