Two clusters based on Faith phylogenetic diversity


I am comparing Faith's phylogenetic diversity across two groups of stool samples from 2-year-old children. Within each group's distribution of Faith PD values, I notice that there appear to be two clusters (see screenshot of boxplot attached). I've investigated to participants in each cluster but haven't found that they differ based on any other variable measured in the study. I'm pretty new to microbiome analyses, and I am not sure how to further evaluate the reason for the clusters or how to address them when reporting results. Does anyone have suggestions for how to investigate or what to say about the clusters? Relatedly, does anyone have a sense of what a reasonable range for Faith PD values would be among stool samples from 2-year-old human children?

Thank you for any help you can provide!

Hi, Fran!

First of all, due to "clusters" I think it's better to use kernel density plot instead of a boxplot.
If there is no correlation with other data - I'm afraid they can't really be addressed in any reasonable way.

Faith PD is dependent upon multitude of variables (like phylogenetic tree construction algorithm, sequencing/rarefaction depth, etc.). Due to a multivariability of the problem, there are no reasonable range for individuals. What you can do though, is to compare results with other published microbiome studies of children.