CPU vs. GPU workunits?

Author	Message
Richard Send message Joined: 14 Mar 06 Posts: 2 Credit: 1,411,408 RAC: 0	Message 1452518 - Posted: 11 Dec 2013, 0:22:20 UTC Last modified: 11 Dec 2013, 0:32:16 UTC SO, I've been running S@H on my desktop and availing it of the GPU after a long time running it on a laptop which is CPU only. I've noticed that the CUDA workunits finish in the order of minutes, compared to CPU workunits which take about 3 hours (4 simultaneously - 1 per core on an 3.3GHZ i5-2500K), which demonstrates why people are packing out their supercomputers with GPUs. I was wondering, are the workunits essentially the same job, packaged for the x86-64/CUDA/OpenCL architectures, or do the GPU workunits perform tasks well suited to a chip with hundreds of teeny CUDA cores, with CPU tasks being optimised for a few big chunky cores (and some virtual ones if you have HT)? That question leads to whether it is necessary for the project to have access to a healthy balance of CPU/GPU to get different tasks done, or basically does GPU = better for S@H, because they're all the same job, just wrapped for different architectures (and the GPUs are apparently a lot faster)? Which would "win"? - Two computers each with a GPU - One computer with 3/4 GPUs SLI'd together? Where the "winner" is the rig of greater value to S@H. Assuming that the answer is "they're all the same, we have CPU jobs because we'll take what we can get, but we wish everyone had chunky discrete GPUs", then I started imagining extreme rigs. It just occurred to me that much of the power cost, space cost, and potentially financial cost of a computer is the Mobo, hard drive, CPU, case - and you have to power all that, even though what you're after is the GPU. If GPUs are the most valuable computational component, then if one were building a dedicated S@H rig should you be looking at maximising the GPU-per-chassis ratio, and how ludicrously far could you take it (technically, not cost-effectively)? For instance, Magma have a 1-to-16 PCIe expansion chassis. For a decent gaming board with two x16 PCIe slots you could theoretically hook up 32 graphics cards (I'm conveniently ignoring the power and cooling issues). Is that a good thing - using the computer as a server for a farm of GPUs - or if everyone did that would the project starve from lack of CPU time for CPU-specific tasks (if they exist, as per my first question up top). Obviously you need cases and PSUs, so you're not spending all your money on GPUs, but you're spending much less as a ratio on Mobos, RAM and CPUs. I'm sure none of this is revolutionary thinking, but couldn't see an answer as to whether the jobs were inherently "different" under the hood, and was wondering from those with more hardware experience than I (I built this desktop but that's about it), how far you could feasibly take it. Obviously the CPU would eventually become a bottleneck for downloading and shunting work units, but how far do people think you could take it? I just have this ridiculous image in my head of a little mATX board with 32 GPUs hanging off it and it just quietly shunting work back and forth, without doing any computation itself. Dreaming, or doable if you had the money for 32 GPUs? ID: 1452518 ·

Richard Send message Joined: 14 Mar 06 Posts: 2 Credit: 1,411,408 RAC: 0	Message 1452529 - Posted: 11 Dec 2013, 0:40:26 UTC Last modified: 11 Dec 2013, 0:41:03 UTC Ah cool thanks, hadn't spotted that one. So it seems the answer is yes you could (OS permitting), except commodity hardware is so cheap it's more cost effective to dumpster-dive old PCs and mount 1-4 GPUs rather than blowing a couple of grand on 16-slot expander chassis. Do I take it the answer to my original question is no, the workunits are all basically the same and since the nature of distributed computing projects is based on parallel processing, GPU >>> CPU? ID: 1452529 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1452590 - Posted: 11 Dec 2013, 4:52:33 UTC - in response to Message 1452518. Last modified: 11 Dec 2013, 4:53:15 UTC current server and client limits, IIRC last time I looked at the code involved, will limit to 8 total GPUs, and not believe if you have more than that. Second limit would be in the applications themselves. In Cuda multibeam I lifted that to 16, just as a mild longevity move. Realistically though, with current desktop driver and application technologies, the limits would be a function of system driver latency (including growing video and PCIe etc) and physical motherboard resources. Many have run into diminishing returns with more than a few GPUs on board usually topping out practically before 8, depending on the system and applications. Those are communications problems stemming from cramming many tasks through small high latency networks like PCIe. Future applications will probably reduce system pressure where possible, but this software side tends to barely keep up with the pace of hardware evolution, treading water. Along those lines during 2014 probably some of the first heterogeneous clusters will be experimented with, so adding efficiency and fault tolerance functionailty by mashing a farm of assorted hardware and apps into one. Reliability/fault tolerance there comes from being able to adapt to problems or changes before the Boinc client ever knows about them, while reducing communications costs. Likely transparent 'local' peering and various forms of load balancing will be involved to get that right, and to some extent research implementations have fault tolerance, & comms/load balancing rather tacked on as an afterthought. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1452590 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.