Interpreting PC in beta-diversity

Julie_Jeon · January 31, 2019, 12:47am

Hi all,

I am studying about gut microboime and usine Qiime pipeline.
When I get the results from beta-diversity, I don't know how to interpret or understand the meaning of PC.
Anyone can explain it for me?? (3 dimensions)
For example, PC1 (25%), PC2 (18%), and PC3 (10%).
So the % of PC means, how much the data is representative? reliable?

Also, when we analyze the beta-diversity,
For example, if I have four groups : Group A, B, C, and D
and I want to see the different beta-diversity among groups. A vs B or A vs C or B vs D...
In the statistical results,
I could see...

All within All between Nonparametric p-value (Bonferroni-corrected)

Group A vs Group B Group C vs Group C 0.05

Then, how can we say this?
Difference between A and B is different with C?
Can we interpret it as a meaningful data?

Sorry for the basic questions...
Thank you so much in advance !!

jwdebelius · January 31, 2019, 1:43pm

Hi @Julie_Jeon,

Beta diversity can be confusing! It takes a while to get comfortable with it.

The PCs describe the variation explained when we compress the data along that PC. So, seperation along your first PC explains about 25% of the overall variance in your data, seperation along the second about 18%... and so on. As your sample size gets bigger, you have more PCs and each PC explains less. (counter intuitive and frustrating, but also life).

As far as your comparisons across the 4 groups. I want to start with 2 groups and then generalize from there. With beta diversity, we're testing distances between communities. Imagine my general hypothesis is that subdivisions have simillar houses. So, I have a label for housing type, and then compare the geographic distance between the different types of houses. (Let's say houses and castles : for funsies.) The first distance you'd measure is the distances from one ₁ to another ₂. Maybe they're five miles apart. Then, we're going to measure the distance from ₁ to ₂. Maybe they're 100 miles apart. Finally, we'll measure the distance from to . In this case, we're going to get 4 distances: the distance from ₁ to ₁, the distance from ₁ to ₂ and the distance from ₂ to ₁ and from ₂ to ₂.

Okay, so now there are a handful of hypotheses we can test.

are closer together (more geographically similar) than .
The geographic distance among and among is smaller than the distance between and .

The answer to this question can tell you different things about your community (although both will give you a significantly different result.) But, the question you're probably asking is 2. So, then, you want to look at whether within- and within- are smaller than between and , because that would imply that there's probably some interesting geography separating the two types of buildings. And, so, when you read the output and you see

vs and vs , the distance you're comparing is the distance among all the vs the distance between and . It helps you figure out which of your groups are driving the difference, so then, you can attack future analysis steps, like taxonomic comparisons from the right perspective.

And... if it turns out that case 1 above is true, there's some cool ecological theory emerging there as well, so there's potentially something of interest as well!

(The explaining obviously comes with the caveats of (a) substitute your preferred clinical/environmental variable and distance metric and (b) emojis are fun.)

Good luck with your analysis, and hopefully this helps!

best,
Justine

Julie_Jeon · January 31, 2019, 11:17pm

Hi Justine!
You are awesome!!
Thank you so much for your kind explanation.
It helps me a lot
Thanks!!

Best,

Julie