How to interpret significance from beta group diversity boxplots?

Hi, I've looked through threads and still can't seem to figure this out - how do I interpret significance from by beta diversity boxplots?

What I don't understand is why most of the boxplots overlap with each other, yet the median lines are not overlapping (which I assume is why they are significant)

This second graph has autosheep and autowharf as non-significantly different for reference.

Thanks!

Hello again, Joe,

What distance did you use for this calculation?

Did you also make a PCoA plot (like this)? I usually use this stat test to measure the differences I see in the PCoA plot.

Hi @colinbrislawn, sorry for the delayed response! I was knocked out for a week with illness.

I used a Bray-Curtis distance matrix for the beta group significance charts, and here is a visualisation of it. One thing to note is I've not yet divided samples by timepoint, which is something I absolutely should have done (and will do when I do visualisations in R).

bray_curtis_emperor.qzv (873.5 KB)

1 Like

Thank you for sharing that with me Joe! I'm glad you are feeling better too.

I support both splitting up samples by cohort and using R!

I suspect that these samples are pretty different, and Bray Curtis is making this look more extreme. It's worth remembering that Bray-Curtis is a dissimilarity, not a distance, so it may be a little wonky.

It is not a distance since it does not satisfy triangle inequality,

Switching to a distance like weighted Jaccard or a phylogenetic distance like UniFrac or Weighted UniFrac can make for more meaningful distances, and later more meaningful graphs and stat tests.

If you want to post those new graphs here, I'm happy to take a look! :bar_chart:

Thanks for the response! Here's the other graphs.

Weighted unifrac

Unweighted unifrac

Jaccard

1 Like

Hi @Joeee,
Are these samples from different studies?
This plot and the Bray Curtis plots have this really distinct 3 point layout. This seems like one of those cases were you have samples that have no features incommon.

Here is a thread that talks about this more: Odd PCoA from distance matrix

2 Likes

Thanks for the response and sorry for my delay (I was away over Christmas). All of these samples are from the same study and I used one script to produce all the graphs ( Computing diversity metrics — QIIME 2 Cancer Microbiome Intervention Tutorial)

I can see what you mean by the three points though. I need to separate my samples by timepoints to get a clearer picture, but there are two main factors (soil and frass) that can influence the microbiome of my study.

Fresh frass sample groups were always non-significant to other fresh frass groups regardless of the soil treatment, so I think that's why it is clustering uniquely and away from the no/autoclaved frass groups in some points of the graph. If I assume it takes time for the microbiome to develop, it could be that the left side of my Jaccard would be at timepoint 0 while the right side at my final timepoint. I couldn't figure out how to accomdate for timepoint in Qiime2 so my next goal was to do this R.

1 Like

Hi @Joeee,
That all makes sense, I would just keep this in mind as you continue your analysis.

QIIME does have functionality for what you are looking for (I think?)!

If you want statistics for your longitudinal analysis, I would checkout out plugin q2-longitdunal

if you want to vizualize your PCOA in regards to time you can give the emperor command a custom axis. Covered Here in the moving pictures tutorial.

Hope this helps :smile:

2 Likes

Thanks for the response, I've had a go at using those scripts but I ran into some issues as my control samples didn't have a marked timepoint. What I've done instead however is just colour code my samples manually to see if there's any other explanations.

The first thing to say is great spot earlier as the yellow side was entirely leaf samples, which seems obvious in hindsight! Aside from that, the controls are green while the blues are just different timepoints for fresh frass treatments (light blue for 0, dark blue for 14). The same rules apply with autoclaved or no frass too (pink for 0, red for 14). These should be soil samples I think, while the grey (autoclaved or no frass) and brown (fresh) samples are roots collected at day 14. The various same-colour groups batches is generally explained by the soil type.

So, I suppose redoing the graph without leaf samples is probably necessary for more reliable results. Seeing how they cluster over each other I'd presume none of the leaf samples show significant differences regardless of treatment.

But I think I've grasped what I should be looking out for on these graphs a little more, and will just play around with variables in R as I find it a bit easier to modify things within one script. Feel free to mark this solved as I feel I have a better idea about what I should be looking out for.

1 Like