Hi @Emily_Yu,
Welcome to the :qiime2: forum!
I think the classic source from this is Numerical Ecology by Legrande and Legrande. I think it's like chapter 9, but I'm not sure off the top of my head. I just went hunting for GustaMe, which is another excellent resource, but doesn't seem to be online right now.
There are two terms that we use which are very similar:
- Principal Components Analysis - PCA
- Principal Coordinates Analysis - PCoA
From a general perspective, you can think of PCA as a special case of PCoA, although the steps are a little different.
In Principal Components Analysis, we:
- Start with a table of features (or transformed table)
- Calculate the euclidean distance between the features (for those paying attention, Aitchison = euclidean distance on CLR-transformed data)
- Use an eigenvalue based ordination to project the distances into lower dimension space (this is a lot of linear algebra that's slightly over my head, TBH)
- Use the known transform to place the features intot he same space (optional)
- Show off your shiny new PCA and evaluate similiarity
In Principal Coordinates analyusis, we:
- Start iwth a distance matrix. These can be any distances you want (and occasionally a dissimilarity). Bray Curtis, unweighted UniFrac, Aitchison, Jensen-Shannon, it doesn't care. Got a metric that compares two samples on the basis of their favorite musical genre ?
- Use an eigenvalue based ordination to project the distances into low dimensional space
- Show off your shiny new PCoA and evaluate similarity
There are a few points of comparison
PCA | PCoA | |
---|---|---|
Input | feature table | distance matrix |
distance used | Euclidean only | any distance you want |
clusters based on metadata | No | No |
uses eigenvalues | Yes | Yes |
can automagically map features into a biplot | Yes | No |
Best,
Justine