| Author |
Message |
RaistmerVolunteer developer Volunteer tester
 Send message
Joined: 16 Jun 01 Posts: 2541 Credit: 25,599,783 RAC: 40,316

|
|
In this thread http://lunatics.kwsn.net/12-gpu-crunching/new-set-of-test-tasks-for-gpu-ap.msg49310.html#msg49310 new test tasks for AP are available, prepared specially for GPU AP performance tuning. There also tuning results will be posted. If you are interesting in GPU AP performance boost take a look. |
|
|
|
|
|
Hi
'Sorry, registration is currently disabled.' at lunatics so I write here.
Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690:
AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096
TaskName: Clean_20LC.wu
Started at : 14:57:05.433
Ended at : 15:02:23.984
318.520 secs Elapsed
312.829 secs CPU time
Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know.
-Kimmo-
____________
Computers: obelix |
|
|
RaistmerVolunteer developer Volunteer tester
 Send message
Joined: 16 Jun 01 Posts: 2541 Credit: 25,599,783 RAC: 40,316

|
Hi
'Sorry, registration is currently disabled.' at lunatics so I write here.
Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690:
AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096
TaskName: Clean_20LC.wu
Started at : 14:57:05.433
Ended at : 15:02:23.984
318.520 secs Elapsed
312.829 secs CPU time
Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know.
-Kimmo-
Nothing special for now, just to find params "sweet spot" for your GPU...
Also, it would be interesting to check new r1363 param - will it save CPU time for your GPU w/o big increase in elapsed time... How to use it described here:
http://lunatics.kwsn.net/12-gpu-crunching/ap6-r1363-for-gpu.msg49205.html#msg49205
EDIT: in general one can mirror tests I do for own GPU and find best params for particular device. To get table with results from benchmark log file I use Perl script (will add it to first post in Lunatics tuning thread). |
|
|
|
|
|
Ok
here are quick results with different unroll params so 10 seems to be best with this card:
------------
Quick timetable
WU : Clean_20LC.wu
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 2 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 335.480 secs
CPU 332.017 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 4 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 317.800 secs
CPU 314.685 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 6 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 314.140 secs
CPU 310.270 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 312.830 secs
CPU 309.943 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 10 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 312.100 secs
CPU 306.105 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 12 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 311.100 secs
CPU 308.071 secs
------------
After that I ran with -v 2 -use_sleep but log file has only lines like
Awaited 1 ms for completion
PC_inner_ffa result is: 0
Awaited 1 ms for completion
PC_inner_ffa result is: 0
Full log file here. It's against ap_Zblank_2LC67_silent_ffa.wu file but does it matter?
-Kimmo-
____________
Computers: obelix |
|
|
RaistmerVolunteer developer Volunteer tester
 Send message
Joined: 16 Jun 01 Posts: 2541 Credit: 25,599,783 RAC: 40,316

|
|
1 ms is too short to try to sleep IMHO.
you need to increase ffa_block a lot before make any use of sleep.
In Windows OS standard time slice value is 20ms. Yield control for 1ms will make only big overhead... but you can try anyway, just find what value for large FFA would be (look for considerably bigger numbers for sleep times in log) |
|
|
|
|
|
I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527, something went very wrong?
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
|
|
I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527, something went very wrong?
recheck your unroll
from your WU data
DATA_CHUNK_UNROLL at default:2
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
Claggy Volunteer tester Send message
Joined: 5 Jul 99 Posts: 3363 Credit: 25,949,576 RAC: 1,251

|
I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527, something went very wrong?
Started and Stopped too many times:
### Restart at 0.90 percent.
### Restart at 3.60 percent.
### Restart at 3.60 percent.
### Restart at 6.31 percent.
### Restart at 8.11 percent.
### Restart at 8.11 percent.
### Restart at 15.32 percent.
### Restart at 15.32 percent.
### Restart at 15.32 percent.
### Restart at 15.32 percent.
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
Info : Building Program (binary, clBuildProgram):main kernels: OK code 0
Termination request detected. GPU device synched, awaiting termination...
Running on device number: 0
DATA_CHUNK_UNROLL at default:2
DATA_CHUNK_UNROLL at default:2
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided device ID used
Used GPU device parameters are:
Number of compute units: 10
Single buffer allocation size: 256MB
max WG size: 256
ERROR: both checkpoint files are damaged, aborting task
Claggy |
|
|
|
|
|
Kamu,
How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006?
I thought GTX690 units were always added in multiples of 2.
____________
|
|
|
|
|
|
:)
I hope too to have 3 of those. ; )
That's number of gpus at host, not graphics cards. That particular time I had 1 690 (2 gpus) and 680 (1 gpu) == [3]
-Kimmo-
____________
Computers: obelix |
|
|
|
|
Kamu,
How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006?
I thought GTX690 units were always added in multiples of 2.
Owner Kamu
Created 7 Jul 2012 | 6:53:57 UTC
Total credit 463,214
Average credit 3,278.27
Cross project credit BOINCstats.com Free-DC
CPU type GenuineIntel
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
Number of processors 8
Coprocessors [3] NVIDIA GeForce GTX 690 (134215679MB)
Operating System Linux
3.2.0-23-generic
BOINC version 7.0.28
Memory 16017.83 MB
Cache 8192 KB
Measured floating point speed 3644.21 million ops/sec
Measured integer speed 13643.93 million ops/sec
Average upload rate 26.18 KB/sec
Average download rate 2152.04 KB/sec
Well, (134215679MB) of memory is also weird ;-).
Maybe the LINUX version (or BOINC?) doesn't 'see' more then 3?!
Those are double GPU cards?!
But we should stay on topic,
GPU AP performance tuning!
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19474 Credit: 21,102,205 RAC: 26,402

|
I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527, something went very wrong?
recheck your unroll
from your WU data
DATA_CHUNK_UNROLL at default:2
This host has hardware issues.
Only errors.
____________
|
|
|
|
|
|
I'm wondering why his unroll shows 2 when he said he'd changed it to 15
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
Claggy Volunteer tester Send message
Joined: 5 Jul 99 Posts: 3363 Credit: 25,949,576 RAC: 1,251

|
I'm wondering why his unroll shows 2 when he said he'd changed it to 15
Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice.
Claggy |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19474 Credit: 21,102,205 RAC: 26,402

|
I'm wondering why his unroll shows 2 when he said he'd changed it to 15
Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice.
Claggy
Exactly.
Freds posts are always a little confusing.
But i get used to it.
____________
|
|
|
|
|
I'm wondering why his unroll shows 2 when he said he'd changed it to 15
Freds posts are always a little confusing.
But i get used to it.
Then I apollogize for this and try to be more specific.......
(Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back).
That would be my ATI host, using unroll 15,
ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device
I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?
In this AP WU
.
And you beat me to it ;-)
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
|
|
|
because its a 1316
Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19474 Credit: 21,102,205 RAC: 26,402

|
|
Stock 6.04 and r1316 are just the same in principle Fred.
____________
|
|
|
Claggy Volunteer tester Send message
Joined: 5 Jul 99 Posts: 3363 Credit: 25,949,576 RAC: 1,251

|
I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?
In this AP WU
.
And you beat me to it ;-)
Says r1316 does the Stderr.txt:
AstroPulse v.6
Non-graphics FFTW USE_CONVERSION_OPT
Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2
Claggy |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19474 Credit: 21,102,205 RAC: 26,402

|
I'm wondering why his unroll shows 2 when he said he'd changed it to 15
Freds posts are always a little confusing.
But i get used to it.
Then I apollogize for this and try to be more specific.......
(Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back).
That would be my ATI host, using unroll 15,
ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device
I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?
In this AP WU
.
And you beat me to it ;-)
This shows 1316 not 1363 which is equal with 6.04 stock.
____________
|
|
|