Creating KEGG-based distance matrix


Wanted to raise a general discussion - how appropiate is it in your opinion to use KEGG-based feature table to create distance matrix and use downstream analysis exceptable for usual composition-based distance matrix (e.g. PERMANOVA, adonis...)?

Obviously some distance measures as UniFrac won't fit, but it sounds like more general measures such as Jaccard or BC will be fine. Still, I barely see analysis like this in the litreature, and most of stuff related to KEGG are usually pre-feature analysis (e.g. Maaslin2).

This is a great question!

Two quick thoughts:

Databases will lead to database bias

  • Using annotations (taxonomy or KEGG) will introduce database bias into the features
  • Summarizing by taxonomy or KEGG pathway will reduce resolution to that of the (limited/biased) annotations
  • (Perhaps people don't make KEGG pathway PCoAs for the same reason they don't make species-level taxonomy PCoAs) EDIT: I have been informed that some people do use summarized data like this. And I use summarized data for bar plots, so maybe this is fine...

A distance matrix is a distance matrix

  • so if you trust the pathways enough to report on them, why NOT put them into a PCoA plot? :person_shrugging: