Message boards :
Number crunching :
Any hazard in period_iterations_num too small?
Message board moderation
Author | Message |
---|---|
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
It is only a modest 750Ti GPU but I like to play with the command line parameters to get the best I can from it. Running the nvidia_opencl_SoG in a "benchmark" isolated configuration and trying various parameters on the same work units. With other parameters (of the oclfft_tune_* set) copied from other threads I am exploring the effects of the -period_iterations_num value. Lowering that parameter to 40, from the default 50, helps about 16%. Below 40 there's not much improvement (less than 10%) although I've only tried it down to 10. The question: for just benchmark testing, is there any (software) hazard in trying period_iterations_num down to 9...8...7... ? Is there a driver limit beyond which the GPU might freeze up? I am guessing (hoping?) that the driver, and the application code, will override a value that is beyond the hardware capability and just do the best they can, with a possible loss of performance as a by-product. That would show up in the benchmark times and tell me clearly that I've gone too far. (Slow screen refresh, and other artifacts of graphics resource contention, are not an issue at this point.) -Gene- |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
[joking mode on]Yes, if you set it too small, you could develop a black hole in your RAM, which will swallow up your computer and make it invisible/[/joking mode off] "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196 |
With lower values I've found I had to disable TDR checking in Windows. Even with that turned off I've had system lockups when I go below 10 in multi-gpu setups. I cannot testify to Linux although I'll be switching one of my crunchers soon so I'll found out I guess! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304 |
With lower values I've found I had to disable TDR checking in Windows. Even with that turned off I've had system lockups when I go below 10 in multi-gpu setups. I'm running my i7 with 1* GTX750Ti & 1* GTX1070 under Win10 with period_iterations_num set to 3. Other than some screen /keyboard lag (particularly on the monitor connected to the GTX 750Ti which I can live with), I've had no issues. I did have it set to 1, but the lag was too much to tolerate- it wouldn't be an issue for a dedicated cruncher. Grant Darwin NT |
bluestar Send message Joined: 5 Sep 12 Posts: 7015 Credit: 2,084,789 RAC: 3 |
Try that with a .vlar task. |
Rune Bjørge Send message Joined: 5 Feb 00 Posts: 45 Credit: 30,508,204 RAC: 5 |
Running with a low number on period_iterations_num does not cause any damage to your hardware. The only thing you might run into is the Timeout Detection and Recovery problem with it's bluescreens. You could also experience display lag with a too low number. I've pushed my gtx Titans to run with a number as low as 2 in a rig running 3 titans. Had to do som tweeks on the TDR system to get it running stable, but now it is crunching like a dream. Lag is also within what I can live with. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
With lower values I've found I had to disable TDR checking in Windows. Even with that turned off I've had system lockups when I go below 10 in multi-gpu setups. . . Hi Grant, . . Did you try swapping the cards around and running the monitor off the 1080? I would think it would be less affected by lag at the lower values of period_iterations_num. Stephen . |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It is only a modest 750Ti GPU but I like to play with the command line parameters to get the best I can from it. Running the nvidia_opencl_SoG in a "benchmark" isolated configuration and trying various parameters on the same work units. With other parameters (of the oclfft_tune_* set) copied from other threads I am exploring the effects of the -period_iterations_num value. Lowering that parameter to 40, from the default 50, helps about 16%. Below 40 there's not much improvement (less than 10%) although I've only tried it down to 10. . . Hi Gene . . Just for a comparison. I am running two GTX950s on an old Pentium D 930 rig. I am running with -period_iterations_num set to 5 with no problems, I did have it set to 3 but the lag became annoying. I don't even know what TDR is or how to set it, but that rig just keeps on chugging away for me. To be honest I don't know that there was any significant improvement in runtimes when it was set to 3 so I am happy where it is. Stephen . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304 |
. . Did you try swapping the cards around and running the monitor off the 1080? One monitor on the GTX 1070, another on the GTX 750Ti. The effect on the GTX 1070 is minimal. Grant Darwin NT |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
Encouraged by message replies, I went ahead with benchmark runs on three different work units, one AR=11.9 and two AR=0.01, using the linux x86_64 opencl_nvidia_SoG application. Tested -period_iterations_num all the way down to "1" with no adverse consequences. Below 7, however, there was no improvement in run times. So, I've settled on 7 as the configuration to use. It was observed that for the AR=11.9 work unit there was NO IMPROVEMENT for any value smaller than the default. The result file showed no "pulse" detections so I guess the "period iterations" part of the code was never executed, or was of trivial length. In contrast, for the longer running vlar work (27no15ac...vlar), which reported 7 pulse detections, here are a few benchmark numbers: <per_it_num> <elapsed seconds> <relative speed> 50 -default- 1996 100% 40 1747 114% 20 1672 119% 10 1611 124% 7 1569 127% 3 1578 126% 1 1568 127% Yeah, o.k., so 1 tick better at num=1 but it was 1 tick worse on a blc work unit, thus, to my mind, not worth the potential screen lag impact. And, for those who might be curious, the cpu is an AMD FX-4300 with its "handicapped" floating-point design; don't know if that is significant in this context. But isn't that the point of benchmarking? I.e. to see what's best for the actual hardware in use? @kittyman [joke_mode] I feared the keyboard LEDs might revert to DEDs (Dark Emitting Diodes) and black out the entire room! [/joke_mode] |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.