Hi, I've looked through threads and still can't seem to figure this out - how do I interpret significance from by beta diversity boxplots?
What I don't understand is why most of the boxplots overlap with each other, yet the median lines are not overlapping (which I assume is why they are significant)
Hi @colinbrislawn, sorry for the delayed response! I was knocked out for a week with illness.
I used a Bray-Curtis distance matrix for the beta group significance charts, and here is a visualisation of it. One thing to note is I've not yet divided samples by timepoint, which is something I absolutely should have done (and will do when I do visualisations in R).
Thank you for sharing that with me Joe! I'm glad you are feeling better too.
I support both splitting up samples by cohort and using R!
I suspect that these samples are pretty different, and Bray Curtis is making this look more extreme. It's worth remembering that Bray-Curtis is a dissimilarity, not a distance, so it may be a little wonky.
Switching to a distance like weighted Jaccard or a phylogenetic distance like UniFrac or Weighted UniFrac can make for more meaningful distances, and later more meaningful graphs and stat tests.
If you want to post those new graphs here, I'm happy to take a look!
Hi @Joeee,
Are these samples from different studies?
This plot and the Bray Curtis plots have this really distinct 3 point layout. This seems like one of those cases were you have samples that have no features incommon.
I can see what you mean by the three points though. I need to separate my samples by timepoints to get a clearer picture, but there are two main factors (soil and frass) that can influence the microbiome of my study.
Fresh frass sample groups were always non-significant to other fresh frass groups regardless of the soil treatment, so I think that's why it is clustering uniquely and away from the no/autoclaved frass groups in some points of the graph. If I assume it takes time for the microbiome to develop, it could be that the left side of my Jaccard would be at timepoint 0 while the right side at my final timepoint. I couldn't figure out how to accomdate for timepoint in Qiime2 so my next goal was to do this R.
Thanks for the response, I've had a go at using those scripts but I ran into some issues as my control samples didn't have a marked timepoint. What I've done instead however is just colour code my samples manually to see if there's any other explanations.
The first thing to say is great spot earlier as the yellow side was entirely leaf samples, which seems obvious in hindsight! Aside from that, the controls are green while the blues are just different timepoints for fresh frass treatments (light blue for 0, dark blue for 14). The same rules apply with autoclaved or no frass too (pink for 0, red for 14). These should be soil samples I think, while the grey (autoclaved or no frass) and brown (fresh) samples are roots collected at day 14. The various same-colour groups batches is generally explained by the soil type.
So, I suppose redoing the graph without leaf samples is probably necessary for more reliable results. Seeing how they cluster over each other I'd presume none of the leaf samples show significant differences regardless of treatment.
But I think I've grasped what I should be looking out for on these graphs a little more, and will just play around with variables in R as I find it a bit easier to modify things within one script. Feel free to mark this solved as I feel I have a better idea about what I should be looking out for.