As part of investigating a bug in phyloseq’s UniFrac calculation (referenced here), I did a comparison of QIIME with several different R software packages’ UniFrac calculations.
Short version: with 4-5 different pieces of software, I got nearly as many different UniFrac values out (though most strongly correlated; phyloseq weighted unifrac being the exception due to a probable bug):
# Correlation matrix for Weighted Unifrac qiime phyloseq rbiom gunifrac qiime 1 0.3009552 1.0000000 0.9757110 phyloseq NA 1.0000000 0.3009552 0.4489365 rbiom NA NA 1.0000000 0.9757110 gunifrac NA NA NA 1.0000000 # Correlation matrix for Unweighted Unifrac qiime phyloseq rbiom gunifrac picante qiime 1 0.9791214 0.9998755 0.9998755 1.0000000 phyloseq NA 1.0000000 0.9799686 0.9799686 0.9791214 rbiom NA NA 1.0000000 1.0000000 0.9998755 gunifrac NA NA NA 1.0000000 0.9998755 picante NA NA NA NA 1.0000000
This worries me because, as I understand it, UniFrac is supposed to be completely deterministic. The original issue says that QIIME benchmarks its implementation via scikit-bio’s unit tests, so for now I’m trusting it, but this variability has me worried. Is there some other independent dataset out there that can be benchmarked against (ideally, something as complex as real data) to know which implementation is truly correct?