CPU vs GPU

Message boards : Number crunching : CPU vs GPU
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ahj

Send message
Joined: 24 Sep 02
Posts: 11
Credit: 110,418
RAC: 0
Australia
Message 1184730 - Posted: 12 Jan 2012, 10:40:54 UTC

Hello

I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10?

Has running seti on CPUs become redundant, or can they still provide useful work?

ID: 1184730 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1184748 - Posted: 12 Jan 2012, 13:49:54 UTC - in response to Message 1184730.  

Has running seti on CPUs become redundant, or can they still provide useful work?

Besides the fact, that any returned (valid) result is useful for the project, CPUs are the only ones that get VLAR tasks assigned, so no, I don't think that using them for SETI is redundant.
ID: 1184748 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1184752 - Posted: 12 Jan 2012, 14:06:07 UTC

I have only a CPU with two OS, one real (Linux) and another virtual (Solaris), I have 4 WUs in a pending state. In two of them the wingman uses cuda_fermi.
Tullio
ID: 1184752 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1184789 - Posted: 12 Jan 2012, 18:42:45 UTC - in response to Message 1184730.  

I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10?

My GTX560Ti does 1 shorty every 1.5 minutes. My i7 2600 does 1 shortie every 4.75 minutes.


Has running seti on CPUs become redundant, or can they still provide useful work?

They still provide usefull work.

Grant
Darwin NT
ID: 1184789 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1184797 - Posted: 12 Jan 2012, 19:12:15 UTC - in response to Message 1184730.  

Hello

I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10?

Has running seti on CPUs become redundant, or can they still provide useful work?

From the project statistics at BoincStats, there are 238518 active hosts with a total RAC of 107435347 so half the work is being done by hosts with RAC < 450.43 . Also there are 52082 hosts with RAC which rounds to 451 or higher, so about 186436 active hosts below the mean RAC. I think it's fair to say most of those are not doing GPU crunching, though exceptions for only part time SETI crunching could be found.

One GTX470 will outproduce all cores of an i5 2500, but the factor depends on what applications you run and various other system specifics. x5 may be about right for the stock applications delivered by the project.
                                                                  Joe
ID: 1184797 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1184815 - Posted: 12 Jan 2012, 20:49:59 UTC - in response to Message 1184748.  

...CPUs are the only ones that get VLAR tasks assigned, so no, I don't think that using them for SETI is redundant.

Maybe it's because of the anonymous platform set-up, but I often get VLARs assigned to my Radeon GPU. While it doesn't suffer as badly as CUDA does in terms of run times, it does cause severe unresponsiveness in the GUI, so I often have to manually push the VLAR tasks back to the CPU. It's a bit frustrating.

But I concur with the point that CPUs are not redundant for S@h processing.
Soli Deo Gloria
ID: 1184815 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1184825 - Posted: 12 Jan 2012, 21:24:55 UTC
Last modified: 12 Jan 2012, 21:25:24 UTC

Each work unit processed is science.
So nothing is redundant in my opinion.

ATI/AMD GPUs can handle VLARs very well.


With each crime and every kindness we birth our future.
ID: 1184825 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1184884 - Posted: 13 Jan 2012, 2:59:27 UTC

I have just built a system around the AMD A8-3870 APU.

I was wondering if anyone knows if the internal GPU (Radeon 6550D) is available for seti. So far I haven't been able to get any WU's for it.

Thanks, Allen
ID: 1184884 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1184894 - Posted: 13 Jan 2012, 4:42:48 UTC - in response to Message 1184884.  

I have just built a system around the AMD A8-3870 APU.

I was wondering if anyone knows if the internal GPU (Radeon 6550D) is available for seti. So far I haven't been able to get any WU's for it.

Thanks, Allen


There is no stock app for AMD GPU's, you would need to install the optimized apps to take advantage of the AMD GPU.
http://lunatics.kwsn.net/index.php?module=Downloads;catd=9

ID: 1184894 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1184915 - Posted: 13 Jan 2012, 8:59:23 UTC - in response to Message 1184815.  

While it doesn't suffer as badly as CUDA does in terms of run times, it does cause severe unresponsiveness in the GUI, so I often have to manually push the VLAR tasks back to the CPU. It's a bit frustrating.

ATi GPUs can handle VLARs much better than NV (comparing on the same code base so looks like it's more hardware function not software).
To reduce lag on VLARs increase -period_iterations_num value.

ID: 1184915 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1184917 - Posted: 13 Jan 2012, 9:00:45 UTC - in response to Message 1184894.  

I have just built a system around the AMD A8-3870 APU.

I was wondering if anyone knows if the internal GPU (Radeon 6550D) is available for seti. So far I haven't been able to get any WU's for it.

Thanks, Allen


There is no stock app for AMD GPU's, you would need to install the optimized apps to take advantage of the AMD GPU.
http://lunatics.kwsn.net/index.php?module=Downloads;catd=9

It would be interesting to see CLinfo output for this GPU to say if it capable or not. Perhaps - yes, it's capable for OpenCL apps too. And surely for hybrid AP.
ID: 1184917 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1184930 - Posted: 13 Jan 2012, 11:09:52 UTC

Thanks to both of you. I will bone up on the info and see what I get.
Stay tuned!

Allen
ID: 1184930 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1184940 - Posted: 13 Jan 2012, 12:11:17 UTC - in response to Message 1184730.  
Last modified: 13 Jan 2012, 12:13:25 UTC

Hello

I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10?

Has running seti on CPUs become redundant, or can they still provide useful work?



470 would be about 1.6x faster, compared to all 4 cores. With my setup, GTX 460@810 MHz core is just about same productive as 2500K@4.65 GHz. So having in mind your 2500 is at 3.3 GHz, and 470 is maybe a bit faster than my clocked GPUs, above number seems to me just about right. But CPU is far more efficient. Jag tror it uses around 50-60 watts, while 470 will use something like 150-160.
My comparison is done using optimized applications. With stock apps GPU will perform even more better than CPU.
ID: 1184940 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1185125 - Posted: 14 Jan 2012, 2:27:11 UTC - in response to Message 1184894.  

I have just built a system around the AMD A8-3870 APU.

I was wondering if anyone knows if the internal GPU (Radeon 6550D) is available for seti. So far I haven't been able to get any WU's for it.

Thanks, Allen


There is no stock app for AMD GPU's, you would need to install the optimized apps to take advantage of the AMD GPU.
http://lunatics.kwsn.net/index.php?module=Downloads;catd=9


Wow, I think that made a difference. I just completed my first wu with the gpu and it was a 4hr 26 min unit that completed in 16 minutes or so.

Seems that the four cpu units are completing much faster now too. It appears that they will complete 4hr plus units in about 1 hour. I don't get it, but I like it.

Thanks again!!

ID: 1185125 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1185147 - Posted: 14 Jan 2012, 5:06:10 UTC - in response to Message 1185125.  
Last modified: 14 Jan 2012, 5:06:44 UTC

Seems that the four cpu units are completing much faster now too. It appears that they will complete 4hr plus units in about 1 hour. I don't get it, but I like it.

First, the estimated completion times for GPU work tend to be way out. Way, way, way out. This is a result of a bug fix, that has yet to be fixed.

But most importantly- the optimised applications give much improved crunching times.
For my i7 GTX460- shorty WUs take about 40 min for CPU (8 at a time), just over 3 min for GPU (2 at a time). VLARs on the CPU take 2-2.5 hours. The longest running WUs on the GPU are about 22min.
Grant
Darwin NT
ID: 1185147 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1185301 - Posted: 14 Jan 2012, 20:34:11 UTC - in response to Message 1185147.  

Thanks for the input Grant.

Those dedicated graphics cards are definitely a boost!!!!!!!!!

However, for the cost of this APU, I believe it's a real value.
Things are still ramping up, so I'm not yet sure just how things will shape up down the road.

You're definitely correct about the optimized apps though, they make a great deal of difference in compute time.

Allen
ID: 1185301 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1185386 - Posted: 15 Jan 2012, 2:19:24 UTC - in response to Message 1185301.  

Raistmer I just bought a laptop that has an a4-3300 APU. GPU calls the gpu portion of the chip an SUMO series HD 6480 . It seems to work just fine with the HD5 version of the lunatics app


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1185386 · Report as offensive
doug
Volunteer tester

Send message
Joined: 10 Jul 09
Posts: 202
Credit: 10,828,067
RAC: 0
United States
Message 1185400 - Posted: 15 Jan 2012, 5:30:55 UTC - in response to Message 1185386.  

Have an i7 4 core that says it's a Q840 running at 1.87 GHz on my laptop. I have an ATI FirePro M7820 card in it, which I believe is something in the Radeon 5800 series. I can't tell anymore with all these numbers and names. I'm running the Lunatics apps. VLARs are about the same between CPU and GPU with GPU being maybe slightly faster. Shorties are around 6-10 min wall time GPU vs. 30-50 CPU. AP wu are maybe 2-3 hours GPU vs 12 CPU. So I run 3 cores at 100%, 1 at 90% to feed the GPU. I have 4 CPU wus running at a time and 2 GPU. I reschedule all AP wu as GPU wu. I get about an even mix of VLAR wu for CPU and GPU delivered and since they run about the same I don't mess with it. I seem to get about 7500 RAC out of it, but that is quite variable for a lot of reasons. All in all I consider that quite good for a laptop. The ATI GPU has had no heat problems whatsoever, it's running at about 59 C as we speak. I have burned my leg from the i7 heatsink though when I was wearing shorts in the Summer.

Just my input.
ID: 1185400 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1185402 - Posted: 15 Jan 2012, 5:39:49 UTC - in response to Message 1184917.  

I have just built a system around the AMD A8-3870 APU.

I was wondering if anyone knows if the internal GPU (Radeon 6550D) is available for seti. So far I haven't been able to get any WU's for it.

Thanks, Allen


There is no stock app for AMD GPU's, you would need to install the optimized apps to take advantage of the AMD GPU.
http://lunatics.kwsn.net/index.php?module=Downloads;catd=9

It would be interesting to see CLinfo output for this GPU to say if it capable or not. Perhaps - yes, it's capable for OpenCL apps too. And surely for hybrid AP.


Here is the info from clinfo. Hope it's as interesting as you thought it would be. Maybe you can see something that would be helpful to me in implementing it fully.


Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP (831.4)
Platform Name: AMD Accelerated Parallel Proces
sing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callbac
k cl_amd_offline_devices cl_khr_d3d10_sharing


Platform Name: AMD Accelerated Parallel Proces
sing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 5
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 600Mhz
Address bits: 32
Max memory allocation: 199753728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 6B06C4F4
Name: BeaverCreek
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.1646 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP (831.4)
Extensions: cl_khr_global_int32_base_atomic
s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd
_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 3000Mhz
Address bits: 32
Max memory allocation: 1073741824
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 341
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 6B06C4F4
Name: AMD A8-3870 APU with Radeon(tm)
HD Graphics
Vendor: AuthenticAMD
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP (831.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_ve
c3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


ID: 1185402 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1185408 - Posted: 15 Jan 2012, 5:55:16 UTC - in response to Message 1185400.  

Have an i7 4 core that says it's a Q840 running at 1.87 GHz on my laptop. I have an ATI FirePro M7820 card in it, which I believe is something in the Radeon 5800 series. I can't tell anymore with all these numbers and names. I'm running the Lunatics apps. VLARs are about the same between CPU and GPU with GPU being maybe slightly faster. Shorties are around 6-10 min wall time GPU vs. 30-50 CPU. AP wu are maybe 2-3 hours GPU vs 12 CPU. So I run 3 cores at 100%, 1 at 90% to feed the GPU. I have 4 CPU wus running at a time and 2 GPU. I reschedule all AP wu as GPU wu. I get about an even mix of VLAR wu for CPU and GPU delivered and since they run about the same I don't mess with it. I seem to get about 7500 RAC out of it, but that is quite variable for a lot of reasons. All in all I consider that quite good for a laptop. The ATI GPU has had no heat problems whatsoever, it's running at about 59 C as we speak. I have burned my leg from the i7 heatsink though when I was wearing shorts in the Summer.

Just my input.


No doubt, dedicated graphics are better for churning out WUs, but for the money, I like this APU pretty much.

Add the price of your cpu and graphics card and I'm sure they cost more than the $140 I gave for this APU. The whole APU is running at about 60c with stock heatsink and no OCing. I don't know if I can run more than one WU at a time with the GPU. That's what Boinc set me up with and I haven't checked to see if I can run more than that. I'm using .05 of my cpu to feed the GPU.
Thanks for the info! So far I haven't been running long enough to tell what my RAC is going to be.

Allen
ID: 1185408 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : CPU vs GPU


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.