Message boards :
Number crunching :
How does ocing effect Cuda?
Message board moderation
Author | Message |
---|---|
Voyager Send message Joined: 2 Nov 99 Posts: 602 Credit: 3,264,813 RAC: 0 |
Does processor speed effect cuda processing other than the time it takes to load the card?Will cuda be about the same at 3ghz vs 2ghz ? |
Dywanik Send message Joined: 16 Mar 02 Posts: 29 Credit: 1,913,940 RAC: 0 |
That's a very good question. I don't have a CUDA to test it, however, I believe the answer is: it depends. ;-) Depends from the way you're overclocking your CPU. You usually do it by changing FSB and multiplier, e.g., you can get 2000 MHz both by having FSB 200MHz and multiplier 10 as well as FSB 100MHz and multiplier 20. And here is the point: when you have higher FSB you're overclocking your whole computer, therefore your CUDA would be more efficient for the first case. BUT that won't be that significant unless you're o/c pr0 who benchmarks everything. ;-) "Failure is not an option." Gene Kranz, Apollo 13 Flight Director "Be the change you want to see in the World" Mahatma Gandhi My web-page: www.dywanik.eu |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I'd think that the mobo would be the slow point for the GPU. but I'd doubt that the difference would be that significant. most likely the CPu time would be about 1/3 slower but we are talking less than 120 seconds overall. I doubt that the performance is going to take that big of a hit. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
If a CUDA task start, a 100 % CPU-Core load happen. In this time an OCed CPU/RAM/mobo would help to reduce this processing time. [With stock CPU/mobo: ~ 12 sec. with Raistmer's V10 CUDA app, on my rig] I OCed now my AMD Phenom II X4 940 BE from stock 3.0 GHz to 3.2 GHz [only with changing the multiplier] and have now less CUDA task 100 % CPU-Core load time (- ~ 1 sec. or something). After this 100 % CPU-Core load time, an OCed CPU/RAM/mobo wouldn't help to recuce the GPU calculation time. [The CPU-Core support the GPU with around [up and downs] 24 % in the GPU calculation time, peak to 48 % - on my rig] [guessing.. - to now I didn't had time to look to the whole calculation time of the CUDA tasks] OTOH. It would help to OC also the GPU to recuce the calculation time on the GPU.. :-) |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I'd think OCing the GPU would reduce processing time more. The problem is that the WU's heat the GPU greatly so OCing might just throw somw GPU's over the Heating edge. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I'd think OCing the GPU would reduce processing time more. The problem is that the WU's heat the GPU greatly so OCing might just throw somw GPU's over the Heating edge. I read about.. ~ 100 MHz more [GPU-Core or Shader, it wasn't named] and + ~ 1.000 RAC.. If the GPU fan is on AUTO.. and @ 100 % RPM all the time - I would look twice to my GPU OC.. ;-) |
slozomby Send message Joined: 16 Nov 04 Posts: 20 Credit: 242,588 RAC: 0 |
using my lowball card ( 8400gs) as a test since any change there should be apparent. changing the clock speed from the default 440mhz to 500mhz did not noticibly decrease average processing time. moving the 8400gs between machines and it always comes back to about 2.5 hours per wu. shadercount appears to be the single biggest factor in wu processing speed. |
Zydor Send message Joined: 4 Oct 03 Posts: 172 Credit: 491,111 RAC: 0 |
I have a Phenom2 940 BE 4Gb RAM, with an EN9800GTX+ DK 512Mb. I have found with this setup a significant difference in WU processing when O/C the GPU. Broadly speaking the speed of processing appeared to increase in line with the increase in Shader - the latter is by far the dominent factor in speed increase. Currently it runs at: Engine: 776Mhz Shader: 1925Mhz Memory (stock speed): 1100x2 Mhz. Default shader for the card is 1800, so its set to a 7% speed increase overall, that was reflected in the increase set against old crunch time for a WU on the GPU. The GPU runs at 85% fan speed, and settles at 63degrees constant on full load. Its in a cramped mid tower case with a four disk RAID in there, so will run slightly hotter than a full tower, would probably run at around 60/61 degrees in a full tower. The fan is a stock fan. CUDA WU time is around 16.5mins on average with Raistners V11 Opti App, which in real terms is around eight extra WUs a day due to the O/C if used 7x24. When O/C a gpu its important to increase both the Engine and Shader by the same proportion. Memory speed on the card memory, as such, is not relevant, keep memory at stock speed. Most gpu's will go to about 8-9% O/C before heat issues cause compute errors, I've seen big increases on that on a friend's machine, with the card water cooled. A 7-8% increase on air-cooling seems to be safe for most cards. When O/C'ing it, increase the engine and shader in line with each other by the same proportions, step by step (say 1% at a time), until you get the first compute error, then back off 2% and you will be dead safe. The only remaining question will be card temp as that will vary hugely depending on room ambient temperature, case used, and air flow around cable runs etc etc. Anyone wishing to pursue further, nip over to GPUGRID, there is a long thread there dealing with settings and speeds of different NVIDIA Card types - its a good starter for ten when first looking at the topic. Regards Zy |
elgar Send message Joined: 21 May 99 Posts: 69 Credit: 2,687,478 RAC: 0 |
Overclocking my 260 reduced CUDA time to ~10 minutes from ~13min (600mhz to 715mhz). Overclocking the cpu 500mhz didn't have any affect on computation time. Now if I could just get some CUDA tasks on 6.6.20... |
popandbob Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0 |
Overclocking my 260 reduced CUDA time to ~10 minutes from ~13min (600mhz to 715mhz). Overclocking the cpu 500mhz didn't have any affect on computation time. If you were to up the shaders and memory a bit you will see a lower time yet :) I was seeing times of 10 min with 700/1550/1050 I couldn't push the core any higher but upping the shaders and memory added an extra boost. Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 |
Voyager Send message Joined: 2 Nov 99 Posts: 602 Credit: 3,264,813 RAC: 0 |
I tryed ocing gpu. Uped it by 16%, times went from ~18 min. to ~16min. Cool! |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I tryed ocing gpu. Uped it by 16%, times went from ~18 min. to ~16min. Cool! Yes, GPU processing times can be improved by GPU OCing. But when you do such OCing, please, look for results your GPU returns. That is, if there are some "inconclusives" in pending results. The reason it should be checked (and checked over few days on fixed GPU freqs): When one OCing CPU there is pretty narrow freq interval where system will work but will give invalid results time to time. But with GPU situation is completely different. One can get system freeze with OCed memory or OCed engine freqs too much, but system will continue to work with OCed too much shader freq. This freq can be rised to the value when almost all SETI tasks will fail with different CUDA errors but system in whole will look just normal: no screen distortion, no OS hangs. But such too OCed GPU is counterproductive for SETI. And there is pretty wide freq range where computation will go w/o reported errors but produce invalid results. So one will recive no computation error state in results (and corresponding filter on web site will not show errors) but many GPU results will be invalid still. One can check such state by using "pending" filter on website and look if there are some "Completed, validation inconclusive" results in list. If yes, it's worth to check what result will pass validation with third result (is it your host who produces invalid results or it's your wingman's host). I encountered such situation while OCing my own 9600GSO. Too high shader freq can ruin SETI results but will not damage system stability as whole. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
And how is the OCed CPU involved? I have the feeling now after OCing the CPU from 3 to 3.2 GHz [only changing the multi from 15 to 16], that the CPU CUDA start run ~ 1 sec. faster.. also the complete CPU support time - ~ 5 sec./WU. [0.44x AR-WUs] This is only my feeling, or would help OCing the CPU also to reduce the 'pure' GPU crunching time? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The faster CPU is the smaller initial load phase could be. But I don't think CPU speed will affect much on GPU performance if CPU is fast enough. Moreover time to time CUDA app use busy-loop for threads sync - in these moments the more faster CPU is the more power and cycles will lost in vain. especially with slow GPUs. That is, ultimate host performance tuning depends on some diametral opposite factors, IMHO no "fit to all" answer can be given. In general GPU speed should be balanced with CPU speed. If you have many hosts with many different CPUs and GPUs the best to rearrange them to use fast CPU + fast GPU and slow CPU + slow GPU (slow and fast have no absolute meaning, they are relative between CPUs and GPUs available). BTW, if one wanna consider such tiny effects as influence of CPU OCing on GPU speed he should take into account non-monotonic CPU performance curve versus CPU freq. CPU can't wait 1,3 ticks for memory. It can wait 1 tick or 2 ticks (for example). That is, different CPU freq to memory freq ratios can give non-monotonic performance curve. EDIT: BTW, bus OCing could affect GPU performance because of host memory<-> GPU memory transfers. That is, not CPU OCing but memory OCing could be optimal. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
After changing the multi - HyperPI 1M is running ~ 1.2 sec. faster [~ 5 %].. :-D AFAIK.. HyperPI is a well indicator for OCing - change something [FSB/RAM or others], make the 1M test and look if it's faster or slower.. What would be a well stress procedure test prog for CPU/RAM/mobo OCing? I don't have time [RAC ! ;-D] to make a 24 hours test loop. There is somewhere a prog with which I could test in ~ 10 min. if the new OC is stable? :-D If not for ~ 10 min. ;-) ..which prog would be a well OC test prog for a pure GPU crunching rig? :-) ..because now I'm back to only GPU power.. ;-) If I find finally a well heatsink for my CPU and the small place on my MSI K9A2 Platinum.. Heatsink for MSI K9A2 Platinum ? I will take time to mod my case.. 8. opening for 4th GPU.. and some fan openings.. for well airflow around the GPUs.. Then I will have a pure 4 GPU crunching rig.. :-) |
mimo Send message Joined: 7 Feb 03 Posts: 92 Credit: 14,957,404 RAC: 0 |
Only PCI-e overclocking on non-GPU side will affect GPU performance and only on devices with compute capability 1.0 - they havent async memcopy. 1.1 and up have async memcopy so the pcie latencies can be hidden. And if u have drivers with device overlap (simultaneous memcpy/kernel execution) it is the best |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
The current crunched AR 0.41x WUs [with OCed CPU] have now the same crunching time [get support from the CPU] like in past crunched AR 0.44x WUs without OC. AFAIK: < AR = > crunching time This would mean - my OCed CPU help to reduce also the pure GPU crunchig time.. Of course, this isn't really a test.. ;-D But maybe a reason for to guess it could be? :-) |
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
For a quick, maximum stress, stress test, I use IntelBurnTest from http://www.ultimate-filez.com. This test generates more heat than Prime95. A LOT more on my rig! I recommend use of the Custom test first - 64-128 MB and 8-10 runs. This eliminates over-stressing an unknown system that obviously has a problem after seeing the test results. Don't skip perusing the readme file before starting the exe. Martin |
-ShEm- Send message Joined: 25 Feb 00 Posts: 139 Credit: 4,129,448 RAC: 0 |
OCCT is also good ;) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.