Recommended Specifications to run QIIME2

MartinEarle · May 16, 2019, 6:34pm

Hi everyone,

I have read that RAM is the most important specification for processing data in QIIME, but I was wondering if anyone could provide a recommendation for both RAM and CPU as a baseline.

Thanks,
Martin

TKOneal · May 20, 2019, 7:35pm

Hi MartinEarle,
I'm a researcher using Qiime2 for various projects. So far I haven't been able to find anything with a direct suggestion for a baseline of RAM or CPU but it does seem that it depends on what plugins you choose to use and the size of your samples. I was able to run a small set in two days using my linux laptop with 16G RAM. Hope this helps.

MartinEarle · May 21, 2019, 1:40pm

This is helpful, thank you! I understand that it is a pretty complicated question to answer, hence why there isn't one already out there. Do you mind sharing your processor specifications?

TKOneal · May 24, 2019, 7:06pm

My linux computer is an old HP inspririon 2015 model. I replaced the ram with two 8GB cards. It has an Intel i7 processor, I don't have it with me to give you any further details at the moment. Sorry for the delayed reply!

colinbrislawn · May 24, 2019, 9:49pm

Hello Martin,

Welcome to Qiime 2! :qiime2:

This is a great question. I'm pretty surprised there are no guides that recommend system specs. Hey @ebolyen any advice?

RAM vs CPU is an interesting question: if you have too little RAM, you totally can't use a large database or process a large data set. But... if you have more RAM then you need, there's no benefit. Having less CPU makes things slower, but everything still works. Having a faster CPU or more cores on a server makes everything faster all the time.

I guess I would prefer having 32 cores and 32 GB of ram, then 8 cores and 256 GBs of ram.

Colin

ebolyen · May 24, 2019, 10:15pm

I'd say the reason there aren't any recommendations are because there aren't any clear rules.

That said, we do see RAM as the most consistent limiting factor, but it's usually really modest, something like 12g (e.g. 16G in practice) is sufficient for anything DADA2 is up to (this has likely improved over time as well, so the number may be smaller). For classifier training, that goes up quite a bit, but 64g is enough for virtually anything we have seen (at which point you are on a HPC node and resources stop being a very interesting question).

For CPU I'd say the answer gets harder, as from a price-per-power standpoint, raw throughput vs parallelism are usually inversely related (obviously you can pay a LOT of money for both if you wanted). Seeing as there are very real physical limits to how fast a single CPU core can actually go, you are generally better off with more cores which are slower individually, since most methods which are compute heavy do have specialized code-paths to take advantage multiple cores. Consumer-facing hex-cores are becoming a thing, so getting 12 threads is becoming pretty realistic, which is exciting.

When it comes down to your typical exploratory analysis however working with a feature table and various feature-data artifacts, you'll find virtually anything made in the last decade is perfectly capable. So then it becomes kind of hard to communicate that, since the question of "do I need a fancy computer" is basically "not even remotely" except for some actions. Given that, the question becomes "do I need a fancy computer for those fancy actions", and the reality is you need a server blade for those fancy actions (which you could approximate easily enough if you build your own machine, but you aren't going to be able to buy something like that off the shelf, except for a literal server blade).

I guess as a very rough rule: a laptop from this decade + ~3-5 days of compute on an HPC node is sufficient to do virtually anything you can think of with QIIME 2 (assuming you don't ever need to re-run anything ).

#Pedestal/aside: If researcher time saved is important (which it should be!), then I would argue a better investment is learning tools to automate tasks/analysis. Who cares if the computer spends 5 hours doing a thing instead of 30 minutes if you spend hours managing everything by hand anyway.

MartinEarle · May 27, 2019, 4:56pm

Thank you for the replies! I figured that I would not be able to get a super specific answer due to the nature of the software, but these recommendations help a lot.