What is the implementation of Jaccard in QIIME2, and is it consistent across all functions that use Jaccard, such as diversity, core metrics, and diversity lib?
More broadly, do all the distance metrics in the diversity plugin (as well as the UniFrac methods) represent dissimilarity (where a higher value indicates less similarity between samples)?
Hi @kam ,
I'm not certain, so hopefully an admin can confirm, but I believe the current way non-phylogenetic beta diversity distances are calculated is using the sklearn.metrics.pairwise_distances Scikit Learn library. For example, the q2-diversity function that runs to create a distance matrix using the Jaccard method has an argument for pairwise func = sklearn.metrics.pairwise:
If I'm interpreting the sklearn function correctly, I believe your interpretation is correct: higher individual values indicate higher distances - and therefore dissimilraities - between groups.
I believe that the same Jaccard call would be used across the QIIME2 platform, but I do not know that it is necessarily true that all non-phylogenetic distance metrics are obtained from Scikit Learn (and perhaps, I don't know if that would even be a realistic expectation, should additional distance measures be created outside of the Scikit Learn library). What I can tell you is that I'd keep looking for any beta diversity calculation within the q2-diversity-lib QIIME2 repository within the beta.py script, at least as a start.