Hello all! This is my first time looking at qiime2 data and I came out with two box plots with significant p-values. The first is a Shannon alpha plot and a Bray-Curtis Beta plot. How do I interpret these plots? I am confused by the "distances" term, what does that refer to? From my understanding the larger the Shannon index the larger variety of microbiota detected, and the Bray-Curtis index measures how many microbes are in common between the groups?
Hello Libby,
Alpha and beta indices might be really different, so I'll explain the two you used.
Shannon entropy (or Shannon index) is derived from information theory, a value of Shannon will be the highest if there are a lot of species and their proportions in the sample are equal. How to interpret it depends on the data, you would need to do additional plots to understand what is the reason behind it (i.e. PowerSoil samples are dominated by few taxa, or just poor in general).
Distance is a mathematical term. I.e. we have 2 vectors of species i.e. E.Coli & Salmonella in 2 samples with counts [2, 5] and [3, 4].
We can calculate Euclidean distance between these two bacteria in sample: for E. Coli points here is 3-2=1, Salmonella 4-5=-1.
We can generalize to i dimensions: Distance(p, q) = \sqrt{\sum_\limits{_{i=1}}^n(q_i - p_i)^2}
So for our 2D (2 bacteria simultaneously) case it will be: \sqrt{1^2 + (-1)^2} = \sqrt{2}. This is the basic principle.
Bray-Curtis dissimilarity is a distance metric that measures DIFFERENCE between samples - the higher it is, the more different samples are. On the plot you can see distances to every cleanroom sample from both refrigerator and cleanroom samples. You can see that cleanroom samples are as different from one another as they're different from freezer samples. See statistical measures in the table below.