GPU AP performance tuning


log in

Advanced search

Message boards : Number crunching : GPU AP performance tuning

1 · 2 · Next
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3395
Credit: 46,334,781
RAC: 10,174
Russia
Message 1268301 - Posted: 6 Aug 2012, 9:01:42 UTC
Last modified: 6 Aug 2012, 9:02:04 UTC

In this thread http://lunatics.kwsn.net/12-gpu-crunching/new-set-of-test-tasks-for-gpu-ap.msg49310.html#msg49310 new test tasks for AP are available, prepared specially for GPU AP performance tuning. There also tuning results will be posted. If you are interesting in GPU AP performance boost take a look.

Kamu
Send message
Joined: 19 Jan 02
Posts: 56
Credit: 9,810,425
RAC: 0
Finland
Message 1268342 - Posted: 6 Aug 2012, 12:08:36 UTC

Hi

'Sorry, registration is currently disabled.' at lunatics so I write here.

Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690:

AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096
TaskName: Clean_20LC.wu
Started at : 14:57:05.433
Ended at : 15:02:23.984
318.520 secs Elapsed
312.829 secs CPU time

Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know.

-Kimmo-


____________
Computers: obelix

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3395
Credit: 46,334,781
RAC: 10,174
Russia
Message 1268343 - Posted: 6 Aug 2012, 12:13:00 UTC - in response to Message 1268342.
Last modified: 6 Aug 2012, 12:17:40 UTC

Hi

'Sorry, registration is currently disabled.' at lunatics so I write here.

Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690:

AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096
TaskName: Clean_20LC.wu
Started at : 14:57:05.433
Ended at : 15:02:23.984
318.520 secs Elapsed
312.829 secs CPU time

Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know.

-Kimmo-



Nothing special for now, just to find params "sweet spot" for your GPU...
Also, it would be interesting to check new r1363 param - will it save CPU time for your GPU w/o big increase in elapsed time... How to use it described here:
http://lunatics.kwsn.net/12-gpu-crunching/ap6-r1363-for-gpu.msg49205.html#msg49205

EDIT: in general one can mirror tests I do for own GPU and find best params for particular device. To get table with results from benchmark log file I use Perl script (will add it to first post in Lunatics tuning thread).

Kamu
Send message
Joined: 19 Jan 02
Posts: 56
Credit: 9,810,425
RAC: 0
Finland
Message 1268363 - Posted: 6 Aug 2012, 13:26:43 UTC

Ok

here are quick results with different unroll params so 10 seems to be best with this card:
------------
Quick timetable

WU : Clean_20LC.wu
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 2 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 335.480 secs
CPU 332.017 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 4 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 317.800 secs
CPU 314.685 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 6 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 314.140 secs
CPU 310.270 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 312.830 secs
CPU 309.943 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 10 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 312.100 secs
CPU 306.105 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 12 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 311.100 secs
CPU 308.071 secs

------------


After that I ran with -v 2 -use_sleep but log file has only lines like

Awaited 1 ms for completion
PC_inner_ffa result is: 0
Awaited 1 ms for completion
PC_inner_ffa result is: 0

Full log file here. It's against ap_Zblank_2LC67_silent_ffa.wu file but does it matter?

-Kimmo-

____________
Computers: obelix

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3395
Credit: 46,334,781
RAC: 10,174
Russia
Message 1268443 - Posted: 6 Aug 2012, 17:29:18 UTC - in response to Message 1268363.

1 ms is too short to try to sleep IMHO.
you need to increase ffa_block a lot before make any use of sleep.
In Windows OS standard time slice value is 20ms. Yield control for 1ms will make only big overhead... but you can try anyway, just find what value for large FFA would be (look for considerably bigger numbers for sleep times in log)

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3236
Credit: 31,678,403
RAC: 6,595
Netherlands
Message 1271164 - Posted: 13 Aug 2012, 16:58:28 UTC - in response to Message 1268443.
Last modified: 13 Aug 2012, 17:05:01 UTC

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1271176 - Posted: 13 Aug 2012, 17:15:33 UTC - in response to Message 1271164.

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?

recheck your unroll

from your WU data

DATA_CHUNK_UNROLL at default:2

____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,872,187
RAC: 6,966
United Kingdom
Message 1271184 - Posted: 13 Aug 2012, 17:22:19 UTC - in response to Message 1271164.
Last modified: 13 Aug 2012, 17:26:50 UTC

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?

Started and Stopped too many times:

### Restart at 0.90 percent.

### Restart at 3.60 percent.

### Restart at 3.60 percent.

### Restart at 6.31 percent.

### Restart at 8.11 percent.

### Restart at 8.11 percent.

### Restart at 15.32 percent.

### Restart at 15.32 percent.

### Restart at 15.32 percent.

### Restart at 15.32 percent.
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
Info : Building Program (binary, clBuildProgram):main kernels: OK code 0

Termination request detected. GPU device synched, awaiting termination...
Running on device number: 0
DATA_CHUNK_UNROLL at default:2
DATA_CHUNK_UNROLL at default:2
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided device ID used
Used GPU device parameters are:
Number of compute units: 10
Single buffer allocation size: 256MB
max WG size: 256
ERROR: both checkpoint files are damaged, aborting task


Claggy

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 98,211,573
RAC: 87,945
Finland
Message 1271208 - Posted: 13 Aug 2012, 18:12:59 UTC - in response to Message 1268342.

Kamu,

How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006?

I thought GTX690 units were always added in multiples of 2.
____________

Kamu
Send message
Joined: 19 Jan 02
Posts: 56
Credit: 9,810,425
RAC: 0
Finland
Message 1271220 - Posted: 13 Aug 2012, 18:36:39 UTC

:)

I hope too to have 3 of those. ; )

That's number of gpus at host, not graphics cards. That particular time I had 1 690 (2 gpus) and 680 (1 gpu) == [3]

-Kimmo-

____________
Computers: obelix

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3236
Credit: 31,678,403
RAC: 6,595
Netherlands
Message 1271221 - Posted: 13 Aug 2012, 18:37:13 UTC - in response to Message 1271208.
Last modified: 13 Aug 2012, 18:42:13 UTC

Kamu,

How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006?

I thought GTX690 units were always added in multiples of 2.


Owner Kamu Created 7 Jul 2012 | 6:53:57 UTC Total credit 463,214 Average credit 3,278.27 Cross project credit BOINCstats.com Free-DC CPU type GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] Number of processors 8 Coprocessors [3] NVIDIA GeForce GTX 690 (134215679MB) Operating System Linux 3.2.0-23-generic BOINC version 7.0.28 Memory 16017.83 MB Cache 8192 KB Measured floating point speed 3644.21 million ops/sec Measured integer speed 13643.93 million ops/sec Average upload rate 26.18 KB/sec Average download rate 2152.04 KB/sec


Well, (134215679MB) of memory is also weird ;-).
Maybe the LINUX version (or BOINC?) doesn't 'see' more then 3?!
Those are double GPU cards?!

But we should stay on topic,
GPU AP performance tuning!
____________

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23778
Credit: 32,557,752
RAC: 24,389
Germany
Message 1271254 - Posted: 13 Aug 2012, 20:32:42 UTC - in response to Message 1271176.

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?

recheck your unroll

from your WU data

DATA_CHUNK_UNROLL at default:2


This host has hardware issues.
Only errors.

____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1271258 - Posted: 13 Aug 2012, 20:42:10 UTC - in response to Message 1271254.

I'm wondering why his unroll shows 2 when he said he'd changed it to 15
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,872,187
RAC: 6,966
United Kingdom
Message 1271265 - Posted: 13 Aug 2012, 21:00:30 UTC - in response to Message 1271258.

I'm wondering why his unroll shows 2 when he said he'd changed it to 15

Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice.

Claggy

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23778
Credit: 32,557,752
RAC: 24,389
Germany
Message 1271269 - Posted: 13 Aug 2012, 21:13:00 UTC - in response to Message 1271265.

I'm wondering why his unroll shows 2 when he said he'd changed it to 15

Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice.

Claggy


Exactly.

Freds posts are always a little confusing.
But i get used to it.

____________

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3236
Credit: 31,678,403
RAC: 6,595
Netherlands
Message 1271270 - Posted: 13 Aug 2012, 21:13:20 UTC - in response to Message 1271258.
Last modified: 13 Aug 2012, 21:20:24 UTC

I'm wondering why his unroll shows 2 when he said he'd changed it to 15



Freds posts are always a little confusing.
But i get used to it.


Then I apollogize for this and try to be more specific.......

(Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back).

That would be my ATI host, using unroll 15,
ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device


I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?

In this AP WU
.

And you beat me to it ;-)
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1271275 - Posted: 13 Aug 2012, 21:36:17 UTC - in response to Message 1271270.

because its a 1316

Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2


____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23778
Credit: 32,557,752
RAC: 24,389
Germany
Message 1271277 - Posted: 13 Aug 2012, 21:36:45 UTC

Stock 6.04 and r1316 are just the same in principle Fred.

____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,872,187
RAC: 6,966
United Kingdom
Message 1271279 - Posted: 13 Aug 2012, 21:39:46 UTC - in response to Message 1271270.

I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?

In this AP WU
.

And you beat me to it ;-)


Says r1316 does the Stderr.txt:

AstroPulse v.6
Non-graphics FFTW USE_CONVERSION_OPT
Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2


Claggy

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23778
Credit: 32,557,752
RAC: 24,389
Germany
Message 1271292 - Posted: 13 Aug 2012, 22:05:26 UTC - in response to Message 1271270.

I'm wondering why his unroll shows 2 when he said he'd changed it to 15



Freds posts are always a little confusing.
But i get used to it.


Then I apollogize for this and try to be more specific.......

(Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back).

That would be my ATI host, using unroll 15,
ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device


I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?

In this AP WU
.

And you beat me to it ;-)


This shows 1316 not 1363 which is equal with 6.04 stock.

____________

1 · 2 · Next

Message boards : Number crunching : GPU AP performance tuning

Copyright © 2014 University of California