Alpha Diversity Boxplot Interpretation

asuphotofringe · April 18, 2018, 5:53pm

Hello,

I have a question regarding interpretation of alpha diversity boxplots (Shannon, Bray-Curtis, etc.). How are the whisker and box lengths determined? Does the box represent the upper and lower quartile with the whiskers showing the highest and lowest observation? If that is the case, do points/circles outside of the whiskers represent outliers to the data? I can't seem to find a clear answer specific to QIIME anywhere.

Thank you in advance!

Mehrbod_Estaki · April 19, 2018, 2:37am

Hi @asuphotofringe,

That's a great question! Certainly very important to know a few additional information about how the plots are created. I just wanted to add a couple of questions to this as well. The boxplots are certainly interquartile ranges but it would be important to know if they are drawn with the outliers included or ignored. This would be also important to know for the stats output as well. The whiskers can have a few different meanings depending on which method is used. I personally prefer to have these as 95% CI but I know a lot of tools use a different formula for whisker calculation. For example a discussion of what they are by default in R's boxplot() . Rarely do I see the whiskers represent the upper/lowest range though. But this would be good to document or also add an option to decide what the whiskers would be.

edit: I just remembered the Kruskal-Wallis test is ranked-based so outliers won't matter in the output.
edit2: Looks like there is an ongoing list for upgrading these boxplots. Might be worth adding it there.

tomasz · April 19, 2018, 9:09pm

@Mehrbod_Estaki your answer is accurate!
The boxplot whiskers are the IQR (factor of 1.5), and outliers are plotted as circle symbols, if present.

I opened an issue on GitHub to request this information within the visualization.

system · May 21, 2018, 3:12am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.