Hello!
I am a little confused about the difference between a PCoA and PCA. I ask because I created a PCA plot using a PCoA distance matrix and was told that that was incorrect.
Hello!
I am a little confused about the difference between a PCoA and PCA. I ask because I created a PCA plot using a PCoA distance matrix and was told that that was incorrect.
Hi @Emily_Yu,
Welcome to the :qiime2: forum!
I think the classic source from this is Numerical Ecology by Legrande and Legrande. I think it's like chapter 9, but I'm not sure off the top of my head. I just went hunting for GustaMe, which is another excellent resource, but doesn't seem to be online right now.
There are two terms that we use which are very similar:
From a general perspective, you can think of PCA as a special case of PCoA, although the steps are a little different.
In Principal Components Analysis, we:
In Principal Coordinates analyusis, we:
There are a few points of comparison
PCA | PCoA | |
---|---|---|
Input | feature table | distance matrix |
distance used | Euclidean only | any distance you want |
clusters based on metadata | No | No |
uses eigenvalues | Yes | Yes |
can automagically map features into a biplot | Yes | No |
Best,
Justine
As an update, GustaMe is back online and a great reference!
Principal Component Analysis (PCA) is a multivariate statistical technique used to identify patterns in high-dimensional data by reducing the dimensionality of the data while retaining as much variation as possible. PCA seeks to find a set of new uncorrelated variables (called principal components) that explain the maximum amount of variation in the original data. The principal components are calculated using the covariance matrix of the data.
On the other hand, Principal Coordinates Analysis (PCoA), also known as Metric Multidimensional Scaling (MDS), is a technique used to visualize the pairwise distances between objects in a dataset. PCoA uses a distance matrix (e.g., Euclidean distance, Bray-Curtis distance, Jaccard distance, etc.) to calculate a set of new uncorrelated variables (called principal coordinates) that explain the maximum amount of variation in the distance matrix. The principal coordinates are analogous to the principal components in PCA.