Density plots along PCoA plot axes?

Sam_Degregori · April 19, 2025, 12:04am

Hi all,

I am interested in plotting density plots denoting relative abundance of given taxa along the X-axis of a PCoA plot. Attached a pic below. The plot come from this Hadza hunter gatherer paper I am sure a lot of you recognize. However, I cannot find their GitHub/code for the analyses.

Anyone know of a notebook for this or have starting suggestions. I am seeing that 'ggside' is good for this in R and I understand I basically will have to (melt?) the PC1's or PC2s against relative abundance values for each sample.

Any suggestions would be great!

colinbrislawn · April 21, 2025, 2:17pm

Hello Sam,

That's a beautiful figure!

I make big, multipanel figures in R using the Tidyverse family of packages.
My current favorite is ggpubr from @.KASSAMBARA

ggpubr supports x-y scatter plots with matching boxplots for each axis, like this:

ggscatterhist(
 iris, x = "Sepal.Length", y = "Sepal.Width",
 color = "Species", size = 3, alpha = 0.6,
 palette = c("#00AFBB", "#E7B800", "#FC4E07"),
 margin.plot = "boxplot",
 ggtheme = theme_bw()
)

That's only after all the data is in R. I don't know of an 'easy' way to do this.
It's cool because it's hard

Sam_Degregori · April 21, 2025, 8:55pm

Hey Colin,

This is super helpful. TY. Yeah this might a cool qiime2 plugin idea I might convince someone in the lab to do. But for now R it is. Thanks for the tip!

jwdebelius · April 22, 2025, 1:49pm

I have a github repo (messy) that will do a marginal plot in python, but currently, the marginals have to be the same display as the data set. Happy to link if of interest. (I think ggplot is so much harder than Matplotlib to get right for so many reasons I wont enumerate here).

The one thing I'd consider based on experienc e there is using a kernel smoothing. I tend to average over 5% of the samples to get my curves when I do one verses lining up the exact position.

The other trick will be to have the axes scaled independently and either do each on its own axis, or to do the normalization on its own and then plot on a common scale. Shared y axes aren't going to work becuase the absloute abundance is going to be different? So, like, I'd probably normalize to a max value then scale for plotting. (Although I have no idea if this is possible in ggplot, see above note about the decided lack of specific control).

Best,
Justine

John_Quensen · April 23, 2025, 5:56pm

I just published my method of importing QIIME2 results into R and phyloseq at April 2025 - John Quensen. You can see how I export QIIME2 artifacts (feature table, tree, representative sequences) in some of the tutorials, e.g. Processing 16S Sequences with QIIME2 and DADA2 - John Quensen.
I tell students once you have all results in R, if you can think it you can do it. My approach for making the marginal density plots by family along the first PCoA axis would be to get total counts for each family for each sample, order the samples according to their projection on the first axis and make the density plots with ggplot2.

Sam_Degregori · April 23, 2025, 6:33pm

@jwdebelius would love a link!
yeah I can see how this can be tricky... Might be confused but isn't all about just sharing X values for the density plot? Where say if you are plotting along PC1 you want your x axis to line up with PC1 values? (although similarly you might need to scale them as you are saying..)
And then the y axis for the density plot is completely irrespective of PC1

jwdebelius · April 23, 2025, 6:45pm

Hi @Sam_Degregori,

The plots are here:

github.com/jwdebelius/eurydice

eurydice/plot/plot_ordination.py

main

import numpy as np
import pandas as pd
import skbio

from matplotlib import rcParams
import matplotlib.colors as mpc
import matplotlib.pyplot as plt
import seaborn as sn

rcParams['pdf.fonttype'] = 42
rcParams['ps.fonttype'] = 42

hide_axes = dict(left=False, 
                 right=False, 
                 labelleft=False, 
                 labelright=False, 
                 length=0, labelsize=0)

axis_in = dict(left=True, 
               right=True,

This file has been truncated. show original

It's the sweatpants and old t-shirt of code repositories, so unfortunately, I dont have any examples I can explicitly show you that I'm finding easily. The code I have written is based on having the same margin on PC1 and PC2; you could adapt that into seperate functions though if you wanted.

The thing about the shared x is semi spacing and semi not. If you have two samples with the asme value for PC1, how do you combine them? Do you show they side-by-side (shifting the position relatie to PC1)? What if you have an outlier that's far away? The rolling average kind of lets you get away from the spikiness fo the curve while still letting you see... something. It becoems an issue if there isn't an association with the position you're plotting and you have extreme highs and lows. But, if the value is associated, it can be a nice pattern?

Best,
Justine

Sam_Degregori · April 23, 2025, 8:37pm

ahh now I see the issue... hmm yeah there can't be a perfect solution here without some fidgeting of the data.
I'll try your code and see what I can come up with. Ty!