Was just running beta diversity on some moderately large OTU tables (~5000 samples), and experienced quite a lot of variability in apparent memory requirements—at least as determined by whether my jobs were killed or not!
beta / jaccard: ran fine with 64 GB / 16 cores
beta / bray-curtis: died on 240 GB / 16 cores
unifrac: stride unifrac is godlike and used maybe 4 GB with 32 cores.
Does this fit with your expectations? Would there be any way to rewrite the algorithm for the non-phylogenetic metric calculations to use the same memory-efficient approach as Daniel’s unifrac implementation?