Using PCoA versus PCA

Emily_Yu · April 4, 2023, 8:15pm

Hello!

I am a little confused about the difference between a PCoA and PCA. I ask because I created a PCA plot using a PCoA distance matrix and was told that that was incorrect.

jwdebelius · April 5, 2023, 2:14pm

Hi @Emily_Yu,

Welcome to the :qiime2: forum!

I think the classic source from this is Numerical Ecology by Legrande and Legrande. I think it's like chapter 9, but I'm not sure off the top of my head. I just went hunting for GustaMe, which is another excellent resource, but doesn't seem to be online right now.

There are two terms that we use which are very similar:

Principal Components Analysis - PCA
Principal Coordinates Analysis - PCoA

From a general perspective, you can think of PCA as a special case of PCoA, although the steps are a little different.

In Principal Components Analysis, we:

Start with a table of features (or transformed table)
Calculate the euclidean distance between the features (for those paying attention, Aitchison = euclidean distance on CLR-transformed data)
Use an eigenvalue based ordination to project the distances into lower dimension space (this is a lot of linear algebra that's slightly over my head, TBH)
Use the known transform to place the features intot he same space (optional)
Show off your shiny new PCA and evaluate similiarity

In Principal Coordinates analyusis, we:

Start iwth a distance matrix. These can be any distances you want (and occasionally a dissimilarity). Bray Curtis, unweighted UniFrac, Aitchison, Jensen-Shannon, it doesn't care. Got a metric that compares two samples on the basis of their favorite musical genre ?
Use an eigenvalue based ordination to project the distances into low dimensional space
Show off your shiny new PCoA and evaluate similarity

There are a few points of comparison

	PCA	PCoA
Input	feature table	distance matrix
distance used	Euclidean only	any distance you want
clusters based on metadata	No	No
uses eigenvalues	Yes	Yes
can automagically map features into a biplot	Yes	No

Best,
Justine

jwdebelius · April 10, 2023, 1:22pm

As an update, GustaMe is back online and a great reference!

VincentVasquez · April 26, 2023, 7:08am

Principal Component Analysis (PCA) is a multivariate statistical technique used to identify patterns in high-dimensional data by reducing the dimensionality of the data while retaining as much variation as possible. PCA seeks to find a set of new uncorrelated variables (called principal components) that explain the maximum amount of variation in the original data. The principal components are calculated using the covariance matrix of the data.

On the other hand, Principal Coordinates Analysis (PCoA), also known as Metric Multidimensional Scaling (MDS), is a technique used to visualize the pairwise distances between objects in a dataset. PCoA uses a distance matrix (e.g., Euclidean distance, Bray-Curtis distance, Jaccard distance, etc.) to calculate a set of new uncorrelated variables (called principal coordinates) that explain the maximum amount of variation in the distance matrix. The principal coordinates are analogous to the principal components in PCA.

jianshu93 · May 22, 2025, 7:33am

A bunch of additional comments on the responses above. 1. PCA never calculated any distance. It was based on singular value decomposition. 2. PCoA distance must be a metric distance (Metric space - Wikipedia), that said, Bray-Curtis is not a metric distance, so it cannot be used in PCoA/MDS but can be used in NMDS (nonmetric). Jaccard and unnormalized UniFrac distance are all metric distance. Normalized UniFrac distance (that is the total branch length is different for each pair of samples, tree is trimmed according to each pair of samples, only taxa in either sample will be kept in the trimmed tree), weighted or not, are not metric distance. Other UniFrac such as variance adjusted UniFrac, genralized UniFrac, are alll not metric distance, so cannot be used in PCoA, but can be used in NMDS.

ebolyen · May 22, 2025, 5:05pm

You can fudge it a bit though, which is why we allow Bray-Curtis and other nearly-metric distances through. You might see negative eigenvalues, but so long as they are small, they are typically not understood to be an issue.

For more details, you can see the Notes section under scikit-bio's PCoA implementation.

jianshu93 · May 23, 2025, 2:01am

By saying "nearly metric", just the first 2 rules are met but not the triangle inequality I am assuming, But triangle inequality is the key property of "metric", I would alway report the distribution of eigenvalues for all non-metric distance, but mathematically, the key step in PCoA rely on the fact that the distance is a metric: If D (distance metric) comes from a metric Euclidean space, the triangle inequality guarantees that B is positive-semidefinite (all eigenvalues ≥ 0). Anything could happen if the rule is disobeyed. But currently most results I saw on non-metric distance do not report this.