How are PCoA axes estimated?

smreyes · June 4, 2020, 4:32pm

Can anyone explain how PCoA axes are estimated?

The reason I ask is because I used QIIME2 to calculate beta-diversity metrics. I've exported weighted UniFrac results into R to customize the PCoA (plot colors, etc). I obtained % variation explained for PCoA axes from the emperor plot that was produced in QIIME2. Now, I've been asked to stratify my data, resulting in a subset of the data plotted on 6 different PCoAs. Is there a way to estimate the PCoA axes of the 6 plots based on the existing ordination file?

Thanks!

jwdebelius · June 4, 2020, 10:45pm

Hi @smreyes,

If you're into hardcoree linear algrebra, sone of the math behind PCoA can be found on wikipedia and in some notes by Pierre Legrande.

But, I think this is a bigger challenge. The joy/frustration of PCoA is that it needs to be re-calculated whenever your data set changes. I like to think about it as a map. If i'm showing cities (and their distances) with the continential US , I only need a map of the US. (Maybe I just need a map of California ). But, if I want to add in some distances to some cities in , suddenly, the map is going to need to change to visualize that, and cities that appeared furtheer apart (more dissimilar) in my -only map may suddenly appear a lot more similar!

However, this doesnt really solve your variation problem, because you're still struggling to get back to that question of the total percentage explained by the PCoAs. (Because your new percentage explained will be based on that data.)

Best,
Justine

smreyes · June 6, 2020, 7:25am

Thanks, Justine. I appreciate your reply.

I get that PCoA has to be re-calculated when the data set changes. However, effectively my goal is more like: I have a map of California and I want to highlight different counties based on factors X and Y. Essentially, I want to facet wrap the PCoA so I have multiple pictures of "California", each with different counties highlighted. For that reason, it makes sense to me to keep the PCoA axes the same (because we are still looking at the same picture of California, just with different things highlighted). Would you agree? Thoughts?

jwdebelius · June 8, 2020, 3:47pm

Hi @smreyes,

I spent some time thinking about this. My initial response to re-tiling is that its not always a great idea; ive seen some good papers that use a PCoA with data missing to justify something that i believe, had they displayed the data more honestly, would not have supported that scientific argument. So, I think if you're working to highlight specific subsets, its worth thinking about howo you show that on the whole in a way that communicates it comes from the whole. I dont know if you could use something like the fact that all hte PCs have associated coordinates, and sort of highlight based on that coordinate system. (Technically, the coordinate system is artifical, but if you want to plot subsets or tiles, it might be useful.) I think, though, again, you have to be careful with your display and interpretation.

Best,
Justine

smreyes · June 8, 2020, 5:11pm

Thanks again for your reply, @jwdebelius. The justification for breaking up the plot is exactly that, to be more honest with the data. In our case, not because of missingness, but rather to show that our main effect of interest was not confounded by our study design (randomized crossover trial with randomization stratified by a suspected confounding variable). I concluded maybe the best way to highlight the data as honestly as I can is to have all data plotted on each plots but gray out the points that don't correspond to that panel. That way there's no reason to question the axes, and no need to recalculate.

jwdebelius · June 8, 2020, 5:26pm

Hi @smreyes,

Thank you for explaining your plot; i may have mis interpreted. I think that representation probably really helpful, as long as you've got some representation of all the points there.

Best,
Justine