Emperor for... dummies?

Francisco · February 13, 2020, 8:16pm

Hi! I'm having a hard time trying to understand the emperor plots.

So... any documentation suggested to achieve this goal?

colinbrislawn · February 13, 2020, 10:06pm

Hi!

Great question!

Yeah, the official explanation is pretty technical. Like in this paper...

QIIME calculates the beta diversities between each pairs of input samples, forming a distance matrix. The distance matrix then can be visualized with methods such as Principal Coordinate Analysis (PCoA) (Mardia, Kent, & Bibby, 1979) and hierarchical clustering (Tryon, 1939), both of which have been widely used for data visualization for decades. PCoA transforms the original multidimensional matrix to a new set of orthogonal axes that explain the maximum amount of inertia in the dataset

Yikes!

Here's how I would say this:

In a PCoA plot, each point is one sample.
If points are close together, then those samples are biologically similar.

Then you might start to see clusters of similar samples in your data set and connect that to a metadata factor like Location or HostSpecies.

Let's say I make this PCoA plot of my 8 samples:

OK, so those four samples are very, very close! And what does that mean?

I should explain my study design:

SampleID	Location	HostSpecies
1
2
3
4
5
6
7
8

Let's label samples based on Location (, ). This way, we can see if Location is changing microbial community structure.

So... that's something. Looks like the top half of our PCoA is , and the bottom is more . Cool! Location has an effect on microbial community structure.

Now let's label this same PCoA by HostSpecies:

:nature-paper:

Remember, PCoA rearranges your samples so that similar samples are close together, and different samples are far apart.

So here, we can make two conclusions:

and are different from each other
are very similar to each other, while each is different from

Importantly, PCoA is not looking at my metadata factors when it rearranges my samples, so any clustering we see here is coming only from the microbial community structure itself, as measured by your distance matrices.

So now that we see this relation between microbes and HostSpecies, how do we test it?
qiime diversity beta-group-significance or vegan::adonis()

Let's try an example. Take Figure 2 from this paper. Panel C is a PCoA plot:

c, Between-community (beta) diversity among in n = 23,828 biologically independent samples: principal coordinates analysis (PCoA) of unweighted UniFrac distance, PC1 versus PC2 and PC1 versus PC3, coloured by EMPO levels 2 and 3. Clustering of samples could be explained largely by environment.

Does that help?

Colin

P.S. In my example, I have 8 samples, for only two reps per group. This is bad, and I feel bad. PCoA works great for millions of samples, so I should get more than 8!
P.S.S. This article is great: https://occamstypewriter.org/boboh/2012/01/17/pca_and_pcoa_explained/

Francisco · February 13, 2020, 11:02pm

Thanks a lot!
I'll check the papers

PD: Thanks for the extended explanation, it maked easier to understand the basics of the plots, (Like an "ELI5 from reddit")

system · March 16, 2020, 5:58am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.