Questions about PCoA biplots

Hello everyone,

I have been creating PCoA biplots, and they have been very useful for identifying features that contribute the most in terms of separating the samples in a PCoA plot. Shown below is an example of PCoA biplot created with QIIME 2 and the dokdo package I wrote:

I think I understand enough how to interpret a PCoA biplot. That is, data points represent samples and arrows represent features with their length indicating the amount of loading (importance) with respect to the given axes.

I also know how to generate a PCoA biplot in QIIME 2. This post gives a good overview, but basically you need to use the pcoa-biplot method, which accepts the Artifact types PCoAResults and FeatureTable[RelativeFrequency] as input.

However, I have some questions about the way a PCoA biplot is generated in QIIME 2 that I could not find answers to in this forum.

Q1. Why does pcoa-biplot require FeatureTable[RelativeFrequency] instead of FeatureTable[Frequency]?

One might ask why we need to provide a feature table to begin with, but I think I got this part: Unlike PCA, PCoA is based on a distance matrix or dissimilarity between the samples, so all the feature data are lost in the way and that's why we need to separately provide a feature table. But why use relative frequency? I did find this in the doc:

Project features into a principal coordinates matrix. The features used should be the features used to compute the distance matrix. It is recommended that these variables be normalized in cases of dimensionally heterogeneous physical variables.

However, I still couldn't understand why it's relevant. Any clarification (e.g. "dimensionally heterogeneous physical variables") would be appreciated.

Q2. How does pcoa-biplot correctly project features into an existing PCoA space?

This may be obvious to many, but I just can't wrap my head around this. By creating a PCoAResults artifact you make a N-dimension space with defined axes. Your samples are projected into this space. So far so good. But suddenly, pcoa-biplot projects the features (i.e. relative frequency) into the same space using the same axes and then draws arrows to the features from the origin. How does pcoa-biplot correctly orient (?) itself when projecting the features into the space of samples and why does it work?! Any guidance regarding this would be deeply appreciated :slight_smile:

Q3. Does pcoa-biplot compute a distance matrix between the features before they are projected into a PCoA space?

Q4. What kind of ordination method does pcoa-biplot use for projecting the features into a PCoA space?

Please let me know if any of my questions are unclear. Appreciate your help!

3 Likes

Q1. Why does pcoa-biplot require FeatureTable[RelativeFrequency] instead of FeatureTable[Frequency] ?

The answer to this question lies on the suggestion that Legendre and Legendre give in their Numerical Ecology book. If you are interested the underlying functionality is implemented here.

Q2. How does pcoa-biplot correctly project features into an existing PCoA space?

The projection step is done by means of computing a covariance matrix between the PCoA matrix (samples by principal coordinates), and the feature table (samples by features).

Q3. Does pcoa-biplot compute a distance matrix between the features before they are projected into a PCoA space?

It does not.

Q4. What kind of ordination method does pcoa-biplot use for projecting the features into a PCoA space?

There's no ordination method used as part of pcoa-biplot. The ordination is as provided by the user (usually PCoA). This method won't rank or re-arrange principal axes.


Hopefully these answers are helpful. In general, I would say we now have better ways to compute and estimate these biplots (for example DEICODE). pcoa-biplot is one of the only methods I found that would work when you want to see the relationship of features and a given distance matrix (for example UniFrac).

8 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.