Hello everyone,
I have been creating PCoA biplots, and they have been very useful for identifying features that contribute the most in terms of separating the samples in a PCoA plot. Shown below is an example of PCoA biplot created with QIIME 2 and the dokdo
package I wrote:
I think I understand enough how to interpret a PCoA biplot. That is, data points represent samples and arrows represent features with their length indicating the amount of loading (importance) with respect to the given axes.
I also know how to generate a PCoA biplot in QIIME 2. This post gives a good overview, but basically you need to use the pcoa-biplot
method, which accepts the Artifact types PCoAResults
and FeatureTable[RelativeFrequency]
as input.
However, I have some questions about the way a PCoA biplot is generated in QIIME 2 that I could not find answers to in this forum.
Q1. Why does pcoa-biplot
require FeatureTable[RelativeFrequency]
instead of FeatureTable[Frequency]
?
One might ask why we need to provide a feature table to begin with, but I think I got this part: Unlike PCA, PCoA is based on a distance matrix or dissimilarity between the samples, so all the feature data are lost in the way and that's why we need to separately provide a feature table. But why use relative frequency? I did find this in the doc:
Project features into a principal coordinates matrix. The features used should be the features used to compute the distance matrix. It is recommended that these variables be normalized in cases of dimensionally heterogeneous physical variables.
However, I still couldn't understand why it's relevant. Any clarification (e.g. "dimensionally heterogeneous physical variables") would be appreciated.
Q2. How does pcoa-biplot
correctly project features into an existing PCoA space?
This may be obvious to many, but I just can't wrap my head around this. By creating a PCoAResults
artifact you make a N-dimension space with defined axes. Your samples are projected into this space. So far so good. But suddenly, pcoa-biplot
projects the features (i.e. relative frequency) into the same space using the same axes and then draws arrows to the features from the origin. How does pcoa-biplot
correctly orient (?) itself when projecting the features into the space of samples and why does it work?! Any guidance regarding this would be deeply appreciated
Q3. Does pcoa-biplot
compute a distance matrix between the features before they are projected into a PCoA space?
Q4. What kind of ordination method does pcoa-biplot
use for projecting the features into a PCoA space?
Please let me know if any of my questions are unclear. Appreciate your help!