As part of investigating a bug in phyloseq’s UniFrac calculation (referenced here), I did a comparison of QIIME with several different R software packages’ UniFrac calculations.

Short version: with 4-5 different pieces of software, I got nearly as many different UniFrac values out (though most strongly correlated; phyloseq weighted unifrac being the exception due to a probable bug):

```
# Correlation matrix for Weighted Unifrac
qiime phyloseq rbiom gunifrac
qiime 1 0.3009552 1.0000000 0.9757110
phyloseq NA 1.0000000 0.3009552 0.4489365
rbiom NA NA 1.0000000 0.9757110
gunifrac NA NA NA 1.0000000
# Correlation matrix for Unweighted Unifrac
qiime phyloseq rbiom gunifrac picante
qiime 1 0.9791214 0.9998755 0.9998755 1.0000000
phyloseq NA 1.0000000 0.9799686 0.9799686 0.9791214
rbiom NA NA 1.0000000 1.0000000 0.9998755
gunifrac NA NA NA 1.0000000 0.9998755
picante NA NA NA NA 1.0000000
```

This worries me because, as I understand it, UniFrac is supposed to be completely deterministic. The original issue says that QIIME benchmarks its implementation via scikit-bio’s unit tests, so for now I’m trusting it, but this variability has me worried. Is there some other independent dataset out there that can be benchmarked against (ideally, something as complex as real data) to know which implementation is truly correct?