Message boards :
Number crunching :
CPU vs GPU
Message board moderation
Author | Message |
---|---|
ahj Send message Joined: 24 Sep 02 Posts: 11 Credit: 110,418 RAC: 0 |
Hello I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10? Has running seti on CPUs become redundant, or can they still provide useful work? |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Has running seti on CPUs become redundant, or can they still provide useful work? Besides the fact, that any returned (valid) result is useful for the project, CPUs are the only ones that get VLAR tasks assigned, so no, I don't think that using them for SETI is redundant. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have only a CPU with two OS, one real (Linux) and another virtual (Solaris), I have 4 WUs in a pending state. In two of them the wingman uses cuda_fermi. Tullio |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10? My GTX560Ti does 1 shorty every 1.5 minutes. My i7 2600 does 1 shortie every 4.75 minutes. Has running seti on CPUs become redundant, or can they still provide useful work? They still provide usefull work. Grant Darwin NT |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Hello From the project statistics at BoincStats, there are 238518 active hosts with a total RAC of 107435347 so half the work is being done by hosts with RAC < 450.43 . Also there are 52082 hosts with RAC which rounds to 451 or higher, so about 186436 active hosts below the mean RAC. I think it's fair to say most of those are not doing GPU crunching, though exceptions for only part time SETI crunching could be found. One GTX470 will outproduce all cores of an i5 2500, but the factor depends on what applications you run and various other system specifics. x5 may be about right for the stock applications delivered by the project. Joe |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
...CPUs are the only ones that get VLAR tasks assigned, so no, I don't think that using them for SETI is redundant. Maybe it's because of the anonymous platform set-up, but I often get VLARs assigned to my Radeon GPU. While it doesn't suffer as badly as CUDA does in terms of run times, it does cause severe unresponsiveness in the GUI, so I often have to manually push the VLAR tasks back to the CPU. It's a bit frustrating. But I concur with the point that CPUs are not redundant for S@h processing. Soli Deo Gloria |
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
Each work unit processed is science. So nothing is redundant in my opinion. ATI/AMD GPUs can handle VLARs very well. With each crime and every kindness we birth our future. |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
I have just built a system around the AMD A8-3870 APU. I was wondering if anyone knows if the internal GPU (Radeon 6550D) is available for seti. So far I haven't been able to get any WU's for it. Thanks, Allen |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
I have just built a system around the AMD A8-3870 APU. There is no stock app for AMD GPU's, you would need to install the optimized apps to take advantage of the AMD GPU. http://lunatics.kwsn.net/index.php?module=Downloads;catd=9 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
While it doesn't suffer as badly as CUDA does in terms of run times, it does cause severe unresponsiveness in the GUI, so I often have to manually push the VLAR tasks back to the CPU. It's a bit frustrating. ATi GPUs can handle VLARs much better than NV (comparing on the same code base so looks like it's more hardware function not software). To reduce lag on VLARs increase -period_iterations_num value. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I have just built a system around the AMD A8-3870 APU. It would be interesting to see CLinfo output for this GPU to say if it capable or not. Perhaps - yes, it's capable for OpenCL apps too. And surely for hybrid AP. |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
Thanks to both of you. I will bone up on the info and see what I get. Stay tuned! Allen |
hbomber Send message Joined: 2 May 01 Posts: 437 Credit: 50,852,854 RAC: 0 |
Hello 470 would be about 1.6x faster, compared to all 4 cores. With my setup, GTX 460@810 MHz core is just about same productive as 2500K@4.65 GHz. So having in mind your 2500 is at 3.3 GHz, and 470 is maybe a bit faster than my clocked GPUs, above number seems to me just about right. But CPU is far more efficient. Jag tror it uses around 50-60 watts, while 470 will use something like 150-160. My comparison is done using optimized applications. With stock apps GPU will perform even more better than CPU. |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
I have just built a system around the AMD A8-3870 APU. Wow, I think that made a difference. I just completed my first wu with the gpu and it was a 4hr 26 min unit that completed in 16 minutes or so. Seems that the four cpu units are completing much faster now too. It appears that they will complete 4hr plus units in about 1 hour. I don't get it, but I like it. Thanks again!! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Seems that the four cpu units are completing much faster now too. It appears that they will complete 4hr plus units in about 1 hour. I don't get it, but I like it. First, the estimated completion times for GPU work tend to be way out. Way, way, way out. This is a result of a bug fix, that has yet to be fixed. But most importantly- the optimised applications give much improved crunching times. For my i7 GTX460- shorty WUs take about 40 min for CPU (8 at a time), just over 3 min for GPU (2 at a time). VLARs on the CPU take 2-2.5 hours. The longest running WUs on the GPU are about 22min. Grant Darwin NT |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
Thanks for the input Grant. Those dedicated graphics cards are definitely a boost!!!!!!!!! However, for the cost of this APU, I believe it's a real value. Things are still ramping up, so I'm not yet sure just how things will shape up down the road. You're definitely correct about the optimized apps though, they make a great deal of difference in compute time. Allen |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
Raistmer I just bought a laptop that has an a4-3300 APU. GPU calls the gpu portion of the chip an SUMO series HD 6480 . It seems to work just fine with the HD5 version of the lunatics app In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
doug Send message Joined: 10 Jul 09 Posts: 202 Credit: 10,828,067 RAC: 0 |
Have an i7 4 core that says it's a Q840 running at 1.87 GHz on my laptop. I have an ATI FirePro M7820 card in it, which I believe is something in the Radeon 5800 series. I can't tell anymore with all these numbers and names. I'm running the Lunatics apps. VLARs are about the same between CPU and GPU with GPU being maybe slightly faster. Shorties are around 6-10 min wall time GPU vs. 30-50 CPU. AP wu are maybe 2-3 hours GPU vs 12 CPU. So I run 3 cores at 100%, 1 at 90% to feed the GPU. I have 4 CPU wus running at a time and 2 GPU. I reschedule all AP wu as GPU wu. I get about an even mix of VLAR wu for CPU and GPU delivered and since they run about the same I don't mess with it. I seem to get about 7500 RAC out of it, but that is quite variable for a lot of reasons. All in all I consider that quite good for a laptop. The ATI GPU has had no heat problems whatsoever, it's running at about 59 C as we speak. I have burned my leg from the i7 heatsink though when I was wearing shorts in the Summer. Just my input. |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
I have just built a system around the AMD A8-3870 APU. Here is the info from clinfo. Hope it's as interesting as you thought it would be. Maybe you can see something that would be helpful to me in implementing it fully. Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP (831.4) Platform Name: AMD Accelerated Parallel Proces sing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices cl_khr_d3d10_sharing Platform Name: AMD Accelerated Parallel Proces sing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 5 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 600Mhz Address bits: 32 Max memory allocation: 199753728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 6B06C4F4 Name: BeaverCreek Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1646 (VM) Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP (831.4) Extensions: cl_khr_global_int32_base_atomic s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd _vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3000Mhz Address bits: 32 Max memory allocation: 1073741824 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 65536 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 341 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 6B06C4F4 Name: AMD A8-3870 APU with Radeon(tm) HD Graphics Vendor: AuthenticAMD Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP (831.4) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_ global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3 2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_ve c3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
Have an i7 4 core that says it's a Q840 running at 1.87 GHz on my laptop. I have an ATI FirePro M7820 card in it, which I believe is something in the Radeon 5800 series. I can't tell anymore with all these numbers and names. I'm running the Lunatics apps. VLARs are about the same between CPU and GPU with GPU being maybe slightly faster. Shorties are around 6-10 min wall time GPU vs. 30-50 CPU. AP wu are maybe 2-3 hours GPU vs 12 CPU. So I run 3 cores at 100%, 1 at 90% to feed the GPU. I have 4 CPU wus running at a time and 2 GPU. I reschedule all AP wu as GPU wu. I get about an even mix of VLAR wu for CPU and GPU delivered and since they run about the same I don't mess with it. I seem to get about 7500 RAC out of it, but that is quite variable for a lot of reasons. All in all I consider that quite good for a laptop. The ATI GPU has had no heat problems whatsoever, it's running at about 59 C as we speak. I have burned my leg from the i7 heatsink though when I was wearing shorts in the Summer. No doubt, dedicated graphics are better for churning out WUs, but for the money, I like this APU pretty much. Add the price of your cpu and graphics card and I'm sure they cost more than the $140 I gave for this APU. The whole APU is running at about 60c with stock heatsink and no OCing. I don't know if I can run more than one WU at a time with the GPU. That's what Boinc set me up with and I haven't checked to see if I can run more than that. I'm using .05 of my cpu to feed the GPU. Thanks for the info! So far I haven't been running long enough to tell what my RAC is going to be. Allen |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.