Help disentangling bimodal process in PCoA plot

Hello, it appears that there is some kind of bimodal process in this data set–the “Region P” (blue square) samples form two groups. The confidence ellipsoid obscures this. I could just remove the ellipsoid, but I’d really rather do something more intelligent like break the Region P samples into two groups. However, just looking at the distance matrix (also attached as .tsv), I can’t really tell which samples belong to which group.

Has anyone encountered anything similar, where they discovered groupings within their sample groups that merit further investigation? I would at a minimum like to identify the samples in both clusters in order to see if they have anything in common (e.g. date of DNA isolation). Thanks!

allMaleIxodesDKP_unweighted_unifrac_matrix_distance-matrix.tsv (36.4 KB)

Hi @John_Blazier,
These types of clustering, if not biologically expected, often suggest some sort of batch effect or some other technical issue (ex. comparing different regions of 16S etc, different clustering methods etc.). They are certainly worth investigating further and accounting for.

Sure, there are lots of different ways of identifying these. The easy QIIME 2 solution is to visualize your PCoA plot using q2-emperor, then this would be as easy as clicking the dots and it would display its sample-name. Or you could easily change the coloring scheme to reflect collection date (or any other column from your metadata file) instead of Region.

1 Like