PCoA plots with confidence ellipsoids

I need to render 2D PCoA plots with confidence ellipsoids. After going through help forum posts here I found others looking to do the same and roadmap plans from 2017 to add this functionality--it seemed that this had been added. I generated a jackknifed Emperor plot using "qiime diversity beta-rarefaction", which I had hoped would have confidence ellipsoids, but it did not, so I exported the data from this visualization.qzv file using "qiime tools export" and then tried to render the 2D plot using qiime1's "make_2d_plots.py". I got an error message about the script not being able to guess the format for the file "rarefaction-iteration-correlation.tsv'".
Earlier I was able to get the "make_2d_plots.py" script create 2D plots from a non-jackknifed PCoA using "qiime diversity pcoa" and exporting the data to an "ordination.txt" file.
Is there some further preprocessing necessary to do on the jackknifed exported qiime2 output before qiime1 can make a plot?

I have seen 2D PCoA plots with confidence ellipsoids from papers that used qiime2 (see attached image). Am I missing something?

.

@John_Blazier,
Good question — skip to the bottom of my post to see the answer! QIIME 2 does not really have a way to make 2D plots with ellipsoids (yet).

I think the issue may be that beta-rarefaction is exporting the data in a goofy format — or qiime1 being goofy. Maybe Try exporting the PCoA coordinates output by core-metrics? If that does not work you may need to massage the data a little bit to make qiime1 happy (sorry can’t help you there).

While the authors may have used QIIME 2 for generating the PCoA coordinates, that plot was not made using QIIME 2!

When I want a 2D plot with ellipsoids (which I often do), I do this in python or R, which is probably what the authors of that paper did. Let me give you some R code, since qiime2R makes it easy peasy to load QIIME 2 files in R. See this tutorial:

But you don’t just want a PCoA, you want the ellipses — ggplot2 makes this so easy it is just silly! Just add stat_ellipse() to the end of the example shown in that tutorial, so your command should look like this:

pco$data$Vectors %>%
  rename("#SampleID"=SampleID) %>% #rename to match the metadata table
  left_join(metadata) %>%
  left_join(shannon$data %>% rownames_to_column("#SampleID")) %>%
  ggplot(aes(x=PC1, y=PC2, shape=Subject, color=BodySite, size=shannon)) +
  geom_point() +
  xlab(paste("PC1: ", round(100*pco$data$ProportionExplained[1]), "%")) +
  ylab(paste("PC2: ", round(100*pco$data$ProportionExplained[2]), "%")) +
  theme_bw() +
  ggtitle("Unweighted UniFrac")+
  stat_ellipse()

Please give that a spin and please share the output you see!

4 Likes

Thank you, this looks very helpful!

The code in the tutorial you linked runs fine–all the data imports and looks right. When I try to draw the plot with the code you posted here I get this error:

Joining, by = “#SampleID
Joining, by = “#SampleID
Error: Can’t join on ‘#SampleID’ x ‘#SampleID’ because of incompatible types (character / numeric)

Unfortunately my sample IDs are all four digit numbers. I can’t tell whether this is a problem or whether they are mis-classified in either the imported metadata or imported PCoA data. I’m a Python guy and not an R guy, so my guess is that these sample IDs were imported as int (integer data type) in one data objects but imported as str (string data type) in another. When I look at the imported metadata the sample IDs are labeled as “dbl” data type–that’s an integer, I think. The imported PCoA data is huge and I don’t know where to look for the sample IDs. I am guessing that the sample ids are in there somewhere, as characters not integers…is that possible?
Any ideas on how to resolve this error?

The code I posted was just copied from the tutorial, with the stat_ellipse() line added to the end, so I am not sure why it is not working!

Same here! :snake: So maybe you would prefer to look at this instead: https://matplotlib.org/devdocs/gallery/statistics/confidence_ellipse.html

Hi Nicholas,
That R code worked great as soon as I replaced the numeric sample IDs with alphanumeric ones. Thank you so much.

2 Likes

Nicholas, have you ever tried to calculate a p-value for the separate clustering of groups in one of these 2D PCoA scatter plots? I have several of these scatterplots with 2-4 groups, now with nice confidence ellipsoids. Some of the groups have almost/barely overlapping confidence ellipsoids, and my collaborators want a p-value so they can decisively say whether the groups cluster separately. They referred me to a paper that does this but that does not state anything about methods (this description is from results section, describing the 5 sample unweighted unifrac PCoA I posted above, “Fig. 4”):
“The plot also demonstrated that the male samples from TX clustered separately when compared to the males from Massachusetts (p = 0.001), except one outlier from MA within the cluster of TX males.”

Any thoughts would be greatly appreciated,

Chris

@John_Blazier,
The proper way to do this would be with a PERMANOVA test. See:
qiime diversity beta-group-significance
qiime diversity adonis

Thanks, Nicholas. That’s kind of what I thought. It seemed like they were probably conflating two different analyses.

1 Like

Hello John,
Do you mind sharing how you put the % for each axis? I finished my plot but need to put on the percentage for each axis.

Thanks

Hi @Bing,

That’s all in the code I posted above:

Good luck!

Thanks. This really helps a lot.

An off-topic reply has been split into a new topic: ggplot: coloring pcoa plot

Please keep replies on-topic in the future.