PCoA biplot with Jaccard distance matrix

Hello everyone,
I have a question regarding the correct utilization of the relative frequency table in a PCoA biplot based on Jaccard distance matrix.
Usually, when I calculate PCoA biplot, I proced with the calculation of the distance matrix from my feature table, then its PCoA, convert the same feature table utilized earlier to a relative frequency table,
and then produce the biplot.
In the last dataset I analyzed, I noticed that a feature present in every sample was among the most important ones, but since I'm using Jaccard I found this result strange.
I read in this post Questions about PCoA biplots that projection of the feature on the PCoA space is due to the calculation of a covariance matrix between the PCoA matrix and the feature table, so the projection depends also by the feature abundance, and not only by its presence/absence, if I understood correctly.

So to neutralize the problem I utilized a new relative frequency table where every feature, if present, has the same frequency. In this way, the contribution of the omnipresent feature was very close to 0 (10^-18 on PCo1 and 2).

My question is: is it correct to utilize a relative frequency table where every present feature has the same weight when calculating PCoA biplot of a qualitative metric like Jaccard?

Thank you for your attention, I hope I've been clear with my explanation!

2 Likes

Sure, I would be OK with this method if I found it in a paper and you explained how you rescaled your feature table to make all frequencies either a constant or zero. This makes your distances calculations and also your bi-plot vectors be binary, which is fine.

I guess the other option would be to use weighted distances like Bray-Curtis or weighted Jaccard (Ružička / Ruzicka index), then use relative abundance values for the biplot. But that's not what you want.

I'm not sure how reviewer 3 would feel :upside_down_face:

2 Likes

I would lean toward this option suggested by @colinbrislawn for biplots. I did a quick search for "qualitative biplot" and I'm not turning up anything very informative. Based on how the loadings are calculated though (essentially a correlation between abundances and PCoA axis values) I'm not sure exactly what the loadings would be telling you in this case.

1 Like