PCA vs PCoA - which is the appropriate one for microbiome data

RosePaul · September 11, 2018, 2:56am

Hi,

I have a question on ordination methods. Which is the preferred method for microbial 16s data? is it PCA or PCoA? What is the advantage of one over other. Any help in this regard is greatly appreciated.

Thanks,
Reeba

colinbrislawn · September 11, 2018, 5:07pm

Hello Reeba,

Great question! I know one of the statisticians who work on QIIME 2 could give a very detailed answer, but I want to provide a simple answer to get you started:

I recommend a PCoA ordination of Weighted UniFrac distances between samples.

This tutorial shows you how to make one, and you can view an example PCoA plot here.

PCA and PCoA are really similar. In fact, PCA is just a type of PCoA that uses euclidean distances! So we could say:

Type of Ordination:
- MDS
- CCA
- PCoA
  - PCoA of Jaccard distances
  - PCoA of Bray-Curtis dissimilarities
  - PCoA of Euclidian distances (this is also called PCA)
  - PCoA of UniFrac distances

Let me know if that helps!

Colin

P.S. If you want to learn more about ordinations, this page had lots of information!

ebolyen · September 11, 2018, 5:14pm

@mortonjt also just added aitchison distance in this last (2018.8) release! It's a Euclidian distance, but inside of a centered-log-ratio transformation which is able to handle compositionality (this is one of the reasons PCA isn't terribly common in microbiome analysis).

RosePaul · September 12, 2018, 5:02pm

Thanks for the reply. I came across articles saying Euclidian distance is not good for microbial community data, but didn't get any answer on why it it not good. Also since PCA is PCoA of euclidian distance, is there any condition where it it appropriate to use?

colinbrislawn · September 12, 2018, 5:32pm

Good question.

I want to take a moment to differentiate the ordination method vs the distance metric.

ordination methods
 - PCoA
 - NMDS
 - CCA
distance metrics
 - unifrac
 - jaccard
 - euclidian

I think you probably know that already, but I just wanted to post that for future users to see.

I'm not sure if euclidian is bad, but other methods are arguably more clear or more biologically relevant. Let's take Jaccard and UniFrac as examples.

Jaccard distances are simple:

percentage of taxa not found in both samples

So if 30% of taxa are in both samples, this means 70% are only found in one sample, and the Jaccard distance is 0.7. Very easy!

UniFrac distances are equally easy, and add phylogenetic information:

percentage of phylogenetic branch length not found in both samples

This makes UniFrac a tremendously powerful method for measuring the difference between samples because it incorporates the underlying phylogenetic tree of the taxa.

Now let's look at Euclidian distance

the square root of
    the sum of
        the squares of
            the percentage of unique taxa in each sample

How could that possibly be useful!?!
Who was crazy enough to invent the Euclidean distance?

Colin

system · October 14, 2018, 5:40am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.