Partial UniFrac claculations for large sample sets

Hi team!

I have a large project with 1000s of samples. (I'm very excited :tada:). Samples are getting added intermitently in relatively large batches (n=250-1000).

Is there a UniFrac implementation in QIIME 2 that lets me calculate a partial distance matrix where I only (re) calculate the distances between samples I'm missing? Or is it better to let my HPC admin know that I've got a long running job.

Thanks!
Justine

Hey @jwdebelius,

It takes 8 seconds on a laptop to run 1000 random samples from the EMP on a CPU. Is partial really needed?

Best,
Daniel

1 Like

Thanks @wasade, that's good information! I've had long compute issues in the past with big samples. Is this specifically with the tip-trimmed UniFrac, or the full implementation?

Best,
Justine

Full implementation. You can run a lot of samples on a laptop now, more details can be found here.

Best,
Daniel

2 Likes

This is fantastic! Thank you so much @wasade!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.