Interpretation of % of variance explained in Gemelli rPCA (summing to 100%?)

Hello everyone!

I recently did an rPCA using Gemelli. I followed the documentation regarding the low-rank assumption, which suggests choosing a rank between 2 and 10. I used a rank of 3 because it is the default value. I am confused about how to interpret the proportion of variance explained in the resulting (bi)plot.

When I look at my PCs, the variance explained sums up to exactly 100%. Coming from PCoA, I am used to the first few axes explaining a much smaller proportion of the total variance (e.g., PCo1 11%, PCo2 5%) and having a long tail of unexplained variance.

Does the fact that my rPCA axes sum to 100% mean that:

  1. I have somehow captured 100% of the original biological variance in my dataset? (I don't think so :rofl:)
  2. This a property of the matrix completion algorithm, where the percentage shown is relative only to the reconstructed matrix (the approximation) defined by the rank I chose?

If it is the latter, how would you report this in a manuscript? Is it misleading to say something like "PC1 explained 60% of the variance" without clarifying that this is 60% of the low-rank approximation? It feels like lying! :melting_face:

Thanks in advance for your help!

Best,

Sergio

4 Likes

@cmartino, are you available to advise on this one? Thanks for any help you can provide!

2 Likes

Hi @salias,

Of relevance, a previous thread that somewhat addresses this question is here.

Hope it helps. I often struggle with giving an ELI5 on this issue myself. It makes sense in my head but can’t articulate it well enough in simple terms.

2 Likes

Hi @Mehrbod_Estaki

Thank you for the link! I totally missed that post when I did my forum search prior to asking

Best,

Sergio

1 Like