Beta diversity memory usage

tanaes · August 4, 2018, 3:32am

Was just running beta diversity on some moderately large OTU tables (~5000 samples), and experienced quite a lot of variability in apparent memory requirements—at least as determined by whether my jobs were killed or not!

beta / jaccard: ran fine with 64 GB / 16 cores
beta / bray-curtis: died on 240 GB / 16 cores
unifrac: stride unifrac is godlike and used maybe 4 GB with 32 cores.

Does this fit with your expectations? Would there be any way to rewrite the algorithm for the non-phylogenetic metric calculations to use the same memory-efficient approach as Daniel’s unifrac implementation?

wasade · August 6, 2018, 5:15pm

Hey @tanaes, thanks Do you observe high memory usage on the non-phylogenetic metrics when using a single core? It’s possible the parallel framework is replicating the distance matrix and other temporary objects behind the scenes.

Note that you’ll get better performance on Striped UniFrac with 8 or 16 cores.

Best,
Daniel