Using PCoA versus PCA

Hello!

I am a little confused about the difference between a PCoA and PCA. I ask because I created a PCA plot using a PCoA distance matrix and was told that that was incorrect.

Hi @Emily_Yu,

Welcome to the :qiime2: forum!

I think the classic source from this is Numerical Ecology by Legrande and Legrande. I think it's like chapter 9, but I'm not sure off the top of my head. I just went hunting for GustaMe, which is another excellent resource, but doesn't seem to be online right now.

There are two terms that we use which are very similar:

  • Principal Components Analysis - PCA
  • Principal Coordinates Analysis - PCoA

From a general perspective, you can think of PCA as a special case of PCoA, although the steps are a little different.

In Principal Components Analysis, we:

  1. Start with a table of features (or transformed table)
  2. Calculate the euclidean distance between the features (for those paying attention, Aitchison = euclidean distance on CLR-transformed data)
  3. Use an eigenvalue based ordination to project the distances into lower dimension space (this is a lot of linear algebra that's slightly over my head, TBH)
  4. Use the known transform to place the features intot he same space (optional)
  5. Show off your shiny new PCA and evaluate similiarity

In Principal Coordinates analyusis, we:

  1. Start iwth a distance matrix. These can be any distances you want (and occasionally a dissimilarity). Bray Curtis, unweighted UniFrac, Aitchison, Jensen-Shannon, it doesn't care. Got a metric that compares two samples on the basis of their favorite musical genre :musical_note: ?
  2. Use an eigenvalue based ordination to project the distances into low dimensional space
  3. Show off your shiny new PCoA and evaluate similarity

There are a few points of comparison

PCA PCoA
Input feature table distance matrix
distance used Euclidean only any distance you want
clusters based on metadata No No
uses eigenvalues Yes Yes
can automagically map features into a biplot Yes No

Best,
Justine

5 Likes

As an update, GustaMe is back online and a great reference!

3 Likes

Principal Component Analysis (PCA) is a multivariate statistical technique used to identify patterns in high-dimensional data by reducing the dimensionality of the data while retaining as much variation as possible. PCA seeks to find a set of new uncorrelated variables (called principal components) that explain the maximum amount of variation in the original data. The principal components are calculated using the covariance matrix of the data.

On the other hand, Principal Coordinates Analysis (PCoA), also known as Metric Multidimensional Scaling (MDS), is a technique used to visualize the pairwise distances between objects in a dataset. PCoA uses a distance matrix (e.g., Euclidean distance, Bray-Curtis distance, Jaccard distance, etc.) to calculate a set of new uncorrelated variables (called principal coordinates) that explain the maximum amount of variation in the distance matrix. The principal coordinates are analogous to the principal components in PCA.

2 Likes