CPU vs GPU |
![]() |
| log in |
Message boards : Number crunching : CPU vs GPU
1 · 2 · Next
| Author | Message |
|---|---|
|
Hello | |
| ID: 1184730 · | |
|
yes if you have a multi core cpu. | |
| ID: 1184743 · | |
Has running seti on CPUs become redundant, or can they still provide useful work? Besides the fact, that any returned (valid) result is useful for the project, CPUs are the only ones that get VLAR tasks assigned, so no, I don't think that using them for SETI is redundant. ____________ . | |
| ID: 1184748 · | |
|
I have only a CPU with two OS, one real (Linux) and another virtual (Solaris), I have 4 WUs in a pending state. In two of them the wingman uses cuda_fermi. | |
| ID: 1184752 · | |
I've currently got a GTX470 running seti with seemingly good performance. However, I'd like to know how much more productive it is versus a core i5 2500, for example. 2x faster? x5, x10? My GTX560Ti does 1 shorty every 1.5 minutes. My i7 2600 does 1 shortie every 4.75 minutes. Has running seti on CPUs become redundant, or can they still provide useful work? They still provide usefull work. ____________ Grant Darwin NT. | |
| ID: 1184789 · | |
Hello From the project statistics at BoincStats, there are 238518 active hosts with a total RAC of 107435347 so half the work is being done by hosts with RAC < 450.43 . Also there are 52082 hosts with RAC which rounds to 451 or higher, so about 186436 active hosts below the mean RAC. I think it's fair to say most of those are not doing GPU crunching, though exceptions for only part time SETI crunching could be found. One GTX470 will outproduce all cores of an i5 2500, but the factor depends on what applications you run and various other system specifics. x5 may be about right for the stock applications delivered by the project. Joe | |
| ID: 1184797 · | |
...CPUs are the only ones that get VLAR tasks assigned, so no, I don't think that using them for SETI is redundant. Maybe it's because of the anonymous platform set-up, but I often get VLARs assigned to my Radeon GPU. While it doesn't suffer as badly as CUDA does in terms of run times, it does cause severe unresponsiveness in the GUI, so I often have to manually push the VLAR tasks back to the CPU. It's a bit frustrating. But I concur with the point that CPUs are not redundant for S@h processing. ____________ Soli Deo Gloria | |
| ID: 1184815 · | |
|
Each work unit processed is science. | |
| ID: 1184825 · | |
|
I have just built a system around the AMD A8-3870 APU. | |
| ID: 1184884 · | |
I have just built a system around the AMD A8-3870 APU. There is no stock app for AMD GPU's, you would need to install the optimized apps to take advantage of the AMD GPU. http://lunatics.kwsn.net/index.php?module=Downloads;catd=9 ____________ | |
| ID: 1184894 · | |
While it doesn't suffer as badly as CUDA does in terms of run times, it does cause severe unresponsiveness in the GUI, so I often have to manually push the VLAR tasks back to the CPU. It's a bit frustrating. ATi GPUs can handle VLARs much better than NV (comparing on the same code base so looks like it's more hardware function not software). To reduce lag on VLARs increase -period_iterations_num value. | |
| ID: 1184915 · | |
I have just built a system around the AMD A8-3870 APU. It would be interesting to see CLinfo output for this GPU to say if it capable or not. Perhaps - yes, it's capable for OpenCL apps too. And surely for hybrid AP. | |
| ID: 1184917 · | |
|
Thanks to both of you. I will bone up on the info and see what I get. | |
| ID: 1184930 · | |
Hello 470 would be about 1.6x faster, compared to all 4 cores. With my setup, GTX 460@810 MHz core is just about same productive as 2500K@4.65 GHz. So having in mind your 2500 is at 3.3 GHz, and 470 is maybe a bit faster than my clocked GPUs, above number seems to me just about right. But CPU is far more efficient. Jag tror it uses around 50-60 watts, while 470 will use something like 150-160. My comparison is done using optimized applications. With stock apps GPU will perform even more better than CPU. ____________ | |
| ID: 1184940 · | |
I have just built a system around the AMD A8-3870 APU. Wow, I think that made a difference. I just completed my first wu with the gpu and it was a 4hr 26 min unit that completed in 16 minutes or so. Seems that the four cpu units are completing much faster now too. It appears that they will complete 4hr plus units in about 1 hour. I don't get it, but I like it. Thanks again!! ____________ | |
| ID: 1185125 · | |
Seems that the four cpu units are completing much faster now too. It appears that they will complete 4hr plus units in about 1 hour. I don't get it, but I like it. First, the estimated completion times for GPU work tend to be way out. Way, way, way out. This is a result of a bug fix, that has yet to be fixed. But most importantly- the optimised applications give much improved crunching times. For my i7 GTX460- shorty WUs take about 40 min for CPU (8 at a time), just over 3 min for GPU (2 at a time). VLARs on the CPU take 2-2.5 hours. The longest running WUs on the GPU are about 22min. ____________ Grant Darwin NT. | |
| ID: 1185147 · | |
|
Thanks for the input Grant. | |
| ID: 1185301 · | |
|
Raistmer I just bought a laptop that has an a4-3300 APU. GPU calls the gpu portion of the chip an SUMO series HD 6480 . It seems to work just fine with the HD5 version of the lunatics app | |
| ID: 1185386 · | |
|
Have an i7 4 core that says it's a Q840 running at 1.87 GHz on my laptop. I have an ATI FirePro M7820 card in it, which I believe is something in the Radeon 5800 series. I can't tell anymore with all these numbers and names. I'm running the Lunatics apps. VLARs are about the same between CPU and GPU with GPU being maybe slightly faster. Shorties are around 6-10 min wall time GPU vs. 30-50 CPU. AP wu are maybe 2-3 hours GPU vs 12 CPU. So I run 3 cores at 100%, 1 at 90% to feed the GPU. I have 4 CPU wus running at a time and 2 GPU. I reschedule all AP wu as GPU wu. I get about an even mix of VLAR wu for CPU and GPU delivered and since they run about the same I don't mess with it. I seem to get about 7500 RAC out of it, but that is quite variable for a lot of reasons. All in all I consider that quite good for a laptop. The ATI GPU has had no heat problems whatsoever, it's running at about 59 C as we speak. I have burned my leg from the i7 heatsink though when I was wearing shorts in the Summer. | |
| ID: 1185400 · | |
I have just built a system around the AMD A8-3870 APU. Here is the info from clinfo. Hope it's as interesting as you thought it would be. Maybe you can see something that would be helpful to me in implementing it fully. Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP (831.4) Platform Name: AMD Accelerated Parallel Proces sing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callbac k cl_amd_offline_devices cl_khr_d3d10_sharing Platform Name: AMD Accelerated Parallel Proces sing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 5 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 600Mhz Address bits: 32 Max memory allocation: 199753728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 6B06C4F4 Name: BeaverCreek Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1646 (VM) Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP (831.4) Extensions: cl_khr_global_int32_base_atomic s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo cal_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd _vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3000Mhz Address bits: 32 Max memory allocation: 1073741824 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 65536 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 341 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 6B06C4F4 Name: AMD A8-3870 APU with Radeon(tm) HD Graphics Vendor: AuthenticAMD Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP (831.4) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_ global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3 2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_ve c3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing ____________ | |
| ID: 1185402 · | |
Message boards : Number crunching : CPU vs GPU
| Copyright © 2013 University of California |