Beta diversity plots

fgara · November 2, 2020, 11:43pm

Hi again everyone,
May I ask a few more questions please?

Below I have 3 plots generated using Bray-Curtis, Unweighted Unifrac, and Weighted Unifrac metrics. The points were colored by the subjects whom the samples were taken from.

I'm trying to understand why Bray-Curtis seems to produce the clearest separation between groups (subjects), followed by Unweighted Unifrac, and lastly Weighted Unifrac.

Actually, is the above conclusion correct?

According to this page by Buttigieg and Ramette:

A successful PCoA will generate a few (2-3) axes with relatively large eigenvalues, capturing above 50% of the variation in the input data, with all other axes having small eigenvalues.

Because the 3 axes produced by Unweighted Unifrac metric here only captured 50.17% of the variation in my data, can it be considered "barely successful" (therefore, not a good choice for this data set)?

Is it because there are differences in OTU abundance between samples from different subjects that were ignored by Unweighted Unifrac (that's what I understood the metric does - please kindly correct me if I'm wrong)?

If the above is true, is it then safe to assume that for samples with some differences in abundance between groups, Bray-Curtis would be the best choice? But before I generated these plots, I thought Weighted Unifrac would be a better choice.

Here are the results of qiime diversity core-metrics-phylogenetic on my data:

Bray-Curtis:

Unweighted Unifrac:

Weighted Unifrac:

Thank you very much! I'm so grateful that this forum exists and is frequented by kind, helpful people!

llenzi · November 3, 2020, 10:28am

Hi @fgara,

let see if I can be of any help on your questions!

I certainly agree with you that in your dataset the ‘subject’ category produce a clear separations of your data, according to many beta-diversity distance metrics.

The three metrics you are using measures slightly different things, so to me the fact that they produce slightly different separation of your data may highlight that the phylogenetic relationships among the bacteria in your dataset are not very strong (for which I would expect highly separated data by the UniFrac measures).

Still, to say which is giving the strongest separation, I would use a statistic approach and look at the permanova/permdisp/anosim results (although I would expect all resulting into acceptable p-values by your figures).

What is the best metrics to use? I think it depends on what is your definition of ‘better’ (some may argue that none of the metric you are using are correct, because they do not consider the compositional nature of the data). To me, a better approach would be asking which metrics do describe my dataset. If only one metrics gives separation, I would use it instead of assuming the data are not separated (with a caveat, is that a true result or am I doing anything wrong ?)
(also, thinking while I am writing … if none of the diversity metrics gives a clear separation of your data, is it a problem of the metric or that you are missing important metadata information? eg batches of analysis, kit used and so on …)

So, these are just my thoughts, hope they helps
Cheers

fgara · November 11, 2020, 6:37pm

Hello,

Thank you so much for your kind reply and suggestion!

I will try these - thank you!

system · December 13, 2020, 12:38am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.