understanding bray curtis and Jaccard

Sofita_fuentes · June 15, 2021, 5:51pm

hi, i have an enormous question about how to interpret this results. i did a beta diversity analysis using jaccard and bray curtis metrics. According to the p-value, there is a significant difference bettwen my samples, but in the boxplot (and according to this discussion [(beta diversity explanation (jaccard_distance)) )] shows that there isn't a clear difference with the jaccard distance. am i wrong?
and i dont know how to interprete the bray curtis's results with the jaccard's ressults.
thank you for your help!
best regards sofi

colinbrislawn · June 15, 2021, 10:32pm

Hello @Sofi,

Welcome to the forums!

I'm glad you found that excellent post by Mehrbod that describes how the box plots are made.

Before we dive in, can you post the full command you ran?

Sofita_fuentes · June 15, 2021, 10:45pm

Hi @colinbrislawn thank you very much for your quick response.
the commands that i used are

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-tree.qza
--i-table table.qza
--p-sampling-depth 32000
--m-metadata-file sample-metadata.tsv
--output-dir core-metrics-results

qiime diversity beta-group-significance
--i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza
--m-metadata-file sample-metadata.tsv
--m-metadata-column type
--o-visualization core-metrics-results/bray_curtis_type_significance.qzv \

colinbrislawn · June 16, 2021, 12:20am

Thanks!

As discussed in that thread, the PERMANOVA is performed first to look for differences between groups and give you a p-value of significance.

Afterwards, the box plots are made, and if that p-value is under your alpha threshold, then you can look to see what groups are most different. I think of the box plots as a post-hoc test.

Yeah, the difference are very small but I can see a few...
Pristine to pristine: slightly lower mean, some outliers, larger standard deviation
Human to pristine: slightly higher mean, no outliers, smaller standard deviation

Remember that a stat test can be significant, but the effect size can still be very small.
Paper: Using Effect Size—or Why the P Value Is Not Enough - PMC
Interactive visualization! https://rpsychologist.com/pvalue/

When you view all your samples in a PCoA plot, do these groups visually overlap? The file core-metrics-results/jaccard_emperor.qzv should contain this graph. I posted about how I read those graphs over here.

Let me know what you find!

They are just two different ways of calculating how different two samples are.
Jaccard is the percent of taxa not shared by two samples (A+B/union in the diagram below)

Jaccard and Bray are similar methods, so it makes sense to me that your graphs are similar.

Sofita_fuentes · June 16, 2021, 12:32am

Thank you!!

yeah, I found clustering between the two groups

But I still have the question of how to interpret the results as a whole. I know that bray-curtis considers the abundance of the species and jaccard only the presence / absence of species. Would it be correct to say that the difference between the communities is mainly due to a difference between the most abundant species? or does the interpretation go the other way?

colinbrislawn · June 16, 2021, 1:15am

That's right!

Because you see larger differences in Bray-Curtis dissimilarities than Jaccard distances, that makes sense to me. Keep in mind that you also see differences in Jaccard, which is not biased towards the most abundant features.

Looking at the PCoA, I noticed something else...

While you have 50+ samples, I only see 7 clusters in that PCoA. This makes me worry that a mistake or processing artifact has 'pushed' you samples closer together.

Compare this to a typical PCoA plot, like this one I got from the pd-mouse tutorial:

While the samples still cluster by doner, they are overlapping as much as I see in your 7 clusters.

I know that this is not part of your original question, but I wanted to mention it before reviewer 3 does!

Sofita_fuentes · June 16, 2021, 1:48am

Actually i have 14 samples, you can see only 7 cluster because i have duplicates values and they are overlap.

Thank you! your answer really helps me.

colinbrislawn · June 16, 2021, 2:26am

Oh OK! I was worried at first, so I'm glad the PCoA makes sense.

I'm glad this helped. Thanks for posting on the forums! :qiime2: