I've noticed several forum posts discussing this issue, and it still appears somewhat confusing. The interpretation of Jaccard distance in R's vegan package also seems problematic, but for a different reason (see how many people have computed jaccard distances incorrectly using vegdist? · Issue #153 · vegandevs/vegan · GitHub).
For instance, in this earlier post (jaceard vs bray_curtis vs unifrac - #3 by devonorourke), it suggests that Jaccard distance in QIIME2 is calculated as a dissimilarity metric (higher value indicates less similarity between samples), while this post (beta diversity explanation (jaccard_distance)) suggests the opposite.
So, I have two questions regarding this matter:
- What is the implementation of Jaccard in QIIME2, and is it consistent across all functions that use Jaccard, such as diversity, core metrics, and diversity lib?
- More broadly, do all the distance metrics in the diversity plugin (as well as the UniFrac methods) represent dissimilarity (where a higher value indicates less similarity between samples)?
Hi @kam ,
I'm not certain, so hopefully an admin can confirm, but I believe the current way non-phylogenetic beta diversity distances are calculated is using the sklearn.metrics.pairwise_distances Scikit Learn library. For example, the q2-diversity function that runs to create a distance matrix using the Jaccard method has an argument for
pairwise func = sklearn.metrics.pairwise:
def jaccard(table: biom.Table, n_jobs: int = 1) -> skbio.DistanceMatrix:
counts = table.matrix_data.toarray().T
sample_ids = table.ids(axis='sample')
If I'm interpreting the sklearn function correctly, I believe your interpretation is correct: higher individual values indicate higher distances - and therefore dissimilraities - between groups.
I believe that the same Jaccard call would be used across the QIIME2 platform, but I do not know that it is necessarily true that all non-phylogenetic distance metrics are obtained from Scikit Learn (and perhaps, I don't know if that would even be a realistic expectation, should additional distance measures be created outside of the Scikit Learn library). What I can tell you is that I'd keep looking for any beta diversity calculation within the
q2-diversity-lib QIIME2 repository within the beta.py script, at least as a start.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.