Message boards :
Number crunching :
GPU AP performance tuning
Message board moderation
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
In this thread http://lunatics.kwsn.net/12-gpu-crunching/new-set-of-test-tasks-for-gpu-ap.msg49310.html#msg49310 new test tasks for AP are available, prepared specially for GPU AP performance tuning. There also tuning results will be posted. If you are interesting in GPU AP performance boost take a look. |
Kamu Send message Joined: 19 Jan 02 Posts: 56 Credit: 11,009,499 RAC: 0 |
Hi 'Sorry, registration is currently disabled.' at lunatics so I write here. Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690: AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096 TaskName: Clean_20LC.wu Started at : 14:57:05.433 Ended at : 15:02:23.984 318.520 secs Elapsed 312.829 secs CPU time Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know. -Kimmo- Computers: obelix |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hi Nothing special for now, just to find params "sweet spot" for your GPU... Also, it would be interesting to check new r1363 param - will it save CPU time for your GPU w/o big increase in elapsed time... How to use it described here: http://lunatics.kwsn.net/12-gpu-crunching/ap6-r1363-for-gpu.msg49205.html#msg49205 EDIT: in general one can mirror tests I do for own GPU and find best params for particular device. To get table with results from benchmark log file I use Perl script (will add it to first post in Lunatics tuning thread). |
Kamu Send message Joined: 19 Jan 02 Posts: 56 Credit: 11,009,499 RAC: 0 |
Ok here are quick results with different unroll params so 10 seems to be best with this card: ------------ Quick timetable WU : Clean_20LC.wu AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 2 -ffa_block 4096 -ffa_block_fetch 4096 : Elapsed 335.480 secs CPU 332.017 secs AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 4 -ffa_block 4096 -ffa_block_fetch 4096 : Elapsed 317.800 secs CPU 314.685 secs AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 6 -ffa_block 4096 -ffa_block_fetch 4096 : Elapsed 314.140 secs CPU 310.270 secs AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096 : Elapsed 312.830 secs CPU 309.943 secs AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 10 -ffa_block 4096 -ffa_block_fetch 4096 : Elapsed 312.100 secs CPU 306.105 secs AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 12 -ffa_block 4096 -ffa_block_fetch 4096 : Elapsed 311.100 secs CPU 308.071 secs ------------ After that I ran with -v 2 -use_sleep but log file has only lines like Awaited 1 ms for completion PC_inner_ffa result is: 0 Awaited 1 ms for completion PC_inner_ffa result is: 0 Full log file here. It's against ap_Zblank_2LC67_silent_ffa.wu file but does it matter? -Kimmo- Computers: obelix |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
1 ms is too short to try to sleep IMHO. you need to increase ffa_block a lot before make any use of sleep. In Windows OS standard time slice value is 20ms. Yield control for 1ms will make only big overhead... but you can try anyway, just find what value for large FFA would be (look for considerably bigger numbers for sleep times in log) |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
I did put the rev.1363 app. in place, had the 1316 app. again, waiting for some AstroPulse work, which I got. They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled, {software through CCC}. Looking at results I've come across this one. AP wuid 1044634527, something went very wrong? |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I did put the rev.1363 app. in place, had the 1316 app. recheck your unroll from your WU data DATA_CHUNK_UNROLL at default:2 In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I did put the rev.1363 app. in place, had the 1316 app. Started and Stopped too many times: ### Restart at 0.90 percent. Claggy |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
Kamu, How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006? I thought GTX690 units were always added in multiples of 2. |
Kamu Send message Joined: 19 Jan 02 Posts: 56 Credit: 11,009,499 RAC: 0 |
:) I hope too to have 3 of those. ; ) That's number of gpus at host, not graphics cards. That particular time I had 1 690 (2 gpus) and 680 (1 gpu) == [3] -Kimmo- Computers: obelix |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Kamu, Owner Kamu Created 7 Jul 2012 | 6:53:57 UTC Total credit 463,214 Average credit 3,278.27 Cross project credit BOINCstats.com Free-DC CPU type GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] Number of processors 8 Coprocessors [3] NVIDIA GeForce GTX 690 (134215679MB) Operating System Linux 3.2.0-23-generic BOINC version 7.0.28 Memory 16017.83 MB Cache 8192 KB Measured floating point speed 3644.21 million ops/sec Measured integer speed 13643.93 million ops/sec Average upload rate 26.18 KB/sec Average download rate 2152.04 KB/sec Well, (134215679MB) of memory is also weird ;-). Maybe the LINUX version (or BOINC?) doesn't 'see' more then 3?! Those are double GPU cards?! But we should stay on topic, GPU AP performance tuning! |
Mike Send message Joined: 17 Feb 01 Posts: 34265 Credit: 79,922,639 RAC: 80 |
I did put the rev.1363 app. in place, had the 1316 app. This host has hardware issues. Only errors. With each crime and every kindness we birth our future. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I'm wondering why his unroll shows 2 when he said he'd changed it to 15 In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I'm wondering why his unroll shows 2 when he said he'd changed it to 15 Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice. Claggy |
Mike Send message Joined: 17 Feb 01 Posts: 34265 Credit: 79,922,639 RAC: 80 |
I'm wondering why his unroll shows 2 when he said he'd changed it to 15 Exactly. Freds posts are always a little confusing. But i get used to it. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
I'm wondering why his unroll shows 2 when he said he'd changed it to 15 Then I apollogize for this and try to be more specific....... (Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back). That would be my ATI host, using unroll 15, ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong? In this AP WU . And you beat me to it ;-) |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
because its a 1316 Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2 In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Mike Send message Joined: 17 Feb 01 Posts: 34265 Credit: 79,922,639 RAC: 80 |
Stock 6.04 and r1316 are just the same in principle Fred. With each crime and every kindness we birth our future. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong? Says r1316 does the Stderr.txt: AstroPulse v.6 Claggy |
Mike Send message Joined: 17 Feb 01 Posts: 34265 Credit: 79,922,639 RAC: 80 |
I'm wondering why his unroll shows 2 when he said he'd changed it to 15 This shows 1316 not 1363 which is equal with 6.04 stock. With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.