GPU AP performance tuning

Message boards : Number crunching : GPU AP performance tuning
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6016
Credit: 85,736,704
RAC: 35,488
Russia
Message 1268301 - Posted: 6 Aug 2012, 9:01:42 UTC
Last modified: 6 Aug 2012, 9:02:04 UTC

In this thread http://lunatics.kwsn.net/12-gpu-crunching/new-set-of-test-tasks-for-gpu-ap.msg49310.html#msg49310 new test tasks for AP are available, prepared specially for GPU AP performance tuning. There also tuning results will be posted. If you are interesting in GPU AP performance boost take a look.
ID: 1268301 · Report as offensive
Kamu

Send message
Joined: 19 Jan 02
Posts: 56
Credit: 11,007,400
RAC: 0
Finland
Message 1268342 - Posted: 6 Aug 2012, 12:08:36 UTC

Hi

'Sorry, registration is currently disabled.' at lunatics so I write here.

Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690:

AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096
TaskName: Clean_20LC.wu
Started at : 14:57:05.433
Ended at : 15:02:23.984
318.520 secs Elapsed
312.829 secs CPU time

Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know.

-Kimmo-


Computers: obelix
ID: 1268342 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6016
Credit: 85,736,704
RAC: 35,488
Russia
Message 1268343 - Posted: 6 Aug 2012, 12:13:00 UTC - in response to Message 1268342.  
Last modified: 6 Aug 2012, 12:17:40 UTC

Hi

'Sorry, registration is currently disabled.' at lunatics so I write here.

Here is 2600K@4300MHz, HT disabled, CPU idle, GTX 690:

AppName: AP6_win_x86_SSE2_OpenCL_NV_r1363.exe
AppArgs: -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096
TaskName: Clean_20LC.wu
Started at : 14:57:05.433
Ended at : 15:02:23.984
318.520 secs Elapsed
312.829 secs CPU time

Now that I found out how to setup testing environment and I have couple of days free from work I can run tests if there is something you want to know.

-Kimmo-



Nothing special for now, just to find params "sweet spot" for your GPU...
Also, it would be interesting to check new r1363 param - will it save CPU time for your GPU w/o big increase in elapsed time... How to use it described here:
http://lunatics.kwsn.net/12-gpu-crunching/ap6-r1363-for-gpu.msg49205.html#msg49205

EDIT: in general one can mirror tests I do for own GPU and find best params for particular device. To get table with results from benchmark log file I use Perl script (will add it to first post in Lunatics tuning thread).
ID: 1268343 · Report as offensive
Kamu

Send message
Joined: 19 Jan 02
Posts: 56
Credit: 11,007,400
RAC: 0
Finland
Message 1268363 - Posted: 6 Aug 2012, 13:26:43 UTC

Ok

here are quick results with different unroll params so 10 seems to be best with this card:
------------
Quick timetable

WU : Clean_20LC.wu
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 2 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 335.480 secs
CPU 332.017 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 4 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 317.800 secs
CPU 314.685 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 6 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 314.140 secs
CPU 310.270 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 8 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 312.830 secs
CPU 309.943 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 10 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 312.100 secs
CPU 306.105 secs
AP6_win_x86_SSE2_OpenCL_NV_r1363.exe -unroll 12 -ffa_block 4096 -ffa_block_fetch 4096 :
Elapsed 311.100 secs
CPU 308.071 secs

------------


After that I ran with -v 2 -use_sleep but log file has only lines like

Awaited 1 ms for completion
PC_inner_ffa result is: 0
Awaited 1 ms for completion
PC_inner_ffa result is: 0

Full log file here. It's against ap_Zblank_2LC67_silent_ffa.wu file but does it matter?

-Kimmo-

Computers: obelix
ID: 1268363 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6016
Credit: 85,736,704
RAC: 35,488
Russia
Message 1268443 - Posted: 6 Aug 2012, 17:29:18 UTC - in response to Message 1268363.  

1 ms is too short to try to sleep IMHO.
you need to increase ffa_block a lot before make any use of sleep.
In Windows OS standard time slice value is 20ms. Yield control for 1ms will make only big overhead... but you can try anyway, just find what value for large FFA would be (look for considerably bigger numbers for sleep times in log)
ID: 1268443 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1271164 - Posted: 13 Aug 2012, 16:58:28 UTC - in response to Message 1268443.  
Last modified: 13 Aug 2012, 17:05:01 UTC

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?
ID: 1271164 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,436,947
RAC: 0
Burma
Message 1271176 - Posted: 13 Aug 2012, 17:15:33 UTC - in response to Message 1271164.  

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?

recheck your unroll

from your WU data

DATA_CHUNK_UNROLL at default:2

In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1271176 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,290,221
RAC: 148
United Kingdom
Message 1271184 - Posted: 13 Aug 2012, 17:22:19 UTC - in response to Message 1271164.  
Last modified: 13 Aug 2012, 17:26:50 UTC

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?

Started and Stopped too many times:

### Restart at 0.90 percent.

### Restart at 3.60 percent.

### Restart at 3.60 percent.

### Restart at 6.31 percent.

### Restart at 8.11 percent.

### Restart at 8.11 percent.

### Restart at 15.32 percent.

### Restart at 15.32 percent.

### Restart at 15.32 percent.

### Restart at 15.32 percent.
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
Info : Building Program (binary, clBuildProgram):main kernels: OK code 0

Termination request detected. GPU device synched, awaiting termination...
Running on device number: 0
DATA_CHUNK_UNROLL at default:2
DATA_CHUNK_UNROLL at default:2
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided device ID used
Used GPU device parameters are:
Number of compute units: 10
Single buffer allocation size: 256MB
max WG size: 256
ERROR: both checkpoint files are damaged, aborting task


Claggy
ID: 1271184 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 244
Credit: 197,950,990
RAC: 33,031
Finland
Message 1271208 - Posted: 13 Aug 2012, 18:12:59 UTC - in response to Message 1268342.  

Kamu,

How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006?

I thought GTX690 units were always added in multiples of 2.
ID: 1271208 · Report as offensive
Kamu

Send message
Joined: 19 Jan 02
Posts: 56
Credit: 11,007,400
RAC: 0
Finland
Message 1271220 - Posted: 13 Aug 2012, 18:36:39 UTC

:)

I hope too to have 3 of those. ; )

That's number of gpus at host, not graphics cards. That particular time I had 1 690 (2 gpus) and 680 (1 gpu) == [3]

-Kimmo-

Computers: obelix
ID: 1271220 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1271221 - Posted: 13 Aug 2012, 18:37:13 UTC - in response to Message 1271208.  
Last modified: 13 Aug 2012, 18:42:13 UTC

Kamu,

How do you manage to get 3 instances of GTX 690 into same machine ID: 6720006?

I thought GTX690 units were always added in multiples of 2.


Owner	Kamu
Created	7 Jul 2012 | 6:53:57 UTC
Total credit	463,214
Average credit	3,278.27
Cross project credit	BOINCstats.com Free-DC
CPU type	GenuineIntel
Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
Number of processors	8
Coprocessors	[3] NVIDIA GeForce GTX 690 (134215679MB)
Operating System	Linux
3.2.0-23-generic
BOINC version	7.0.28
Memory	16017.83 MB
Cache	8192 KB
Measured floating point speed	3644.21 million ops/sec
Measured integer speed	13643.93 million ops/sec
Average upload rate	26.18 KB/sec
Average download rate	2152.04 KB/sec 


Well, (134215679MB) of memory is also weird ;-).
Maybe the LINUX version (or BOINC?) doesn't 'see' more then 3?!
Those are double GPU cards?!

But we should stay on topic,
GPU AP performance tuning!
ID: 1271221 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31379
Credit: 66,971,973
RAC: 25,162
Germany
Message 1271254 - Posted: 13 Aug 2012, 20:32:42 UTC - in response to Message 1271176.  

I did put the rev.1363 app. in place, had the 1316 app.
again, waiting for some AstroPulse work, which I got.
They have to be computed yet.Parameters: unroll 15; ffa_block 10240; ffa_blck_fetch 5120; 1 instance_per_device (HD5870 {2x} crossfire disabled,
{software through CCC}.
Looking at results I've come across this one.
AP wuid
1044634527,
something went very wrong?

recheck your unroll

from your WU data

DATA_CHUNK_UNROLL at default:2


This host has hardware issues.
Only errors.

With each crime and every kindness we birth our future.
ID: 1271254 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,436,947
RAC: 0
Burma
Message 1271258 - Posted: 13 Aug 2012, 20:42:10 UTC - in response to Message 1271254.  

I'm wondering why his unroll shows 2 when he said he'd changed it to 15
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1271258 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,290,221
RAC: 148
United Kingdom
Message 1271265 - Posted: 13 Aug 2012, 21:00:30 UTC - in response to Message 1271258.  

I'm wondering why his unroll shows 2 when he said he'd changed it to 15

Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice.

Claggy
ID: 1271265 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31379
Credit: 66,971,973
RAC: 25,162
Germany
Message 1271269 - Posted: 13 Aug 2012, 21:13:00 UTC - in response to Message 1271265.  

I'm wondering why his unroll shows 2 when he said he'd changed it to 15

Fred hasn't completed that Wu yet, it's his wingman that has errored out, a person called Coldice.

Claggy


Exactly.

Freds posts are always a little confusing.
But i get used to it.

With each crime and every kindness we birth our future.
ID: 1271269 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1271270 - Posted: 13 Aug 2012, 21:13:20 UTC - in response to Message 1271258.  
Last modified: 13 Aug 2012, 21:20:24 UTC

I'm wondering why his unroll shows 2 when he said he'd changed it to 15



Freds posts are always a little confusing.
But i get used to it.


Then I apollogize for this and try to be more specific.......

(Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back).

That would be my ATI host, using unroll 15,
ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device


I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?

In this AP WU
.

And you beat me to it ;-)
ID: 1271270 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,436,947
RAC: 0
Burma
Message 1271275 - Posted: 13 Aug 2012, 21:36:17 UTC - in response to Message 1271270.  

because its a 1316

Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1271275 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31379
Credit: 66,971,973
RAC: 25,162
Germany
Message 1271277 - Posted: 13 Aug 2012, 21:36:45 UTC

Stock 6.04 and r1316 are just the same in principle Fred.

With each crime and every kindness we birth our future.
ID: 1271277 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,290,221
RAC: 148
United Kingdom
Message 1271279 - Posted: 13 Aug 2012, 21:39:46 UTC - in response to Message 1271270.  

I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?

In this AP WU
.

And you beat me to it ;-)


Says r1316 does the Stderr.txt:

AstroPulse v.6
Non-graphics FFTW USE_CONVERSION_OPT
Windows x86 rev 1316, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2


Claggy
ID: 1271279 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31379
Credit: 66,971,973
RAC: 25,162
Germany
Message 1271292 - Posted: 13 Aug 2012, 22:05:26 UTC - in response to Message 1271270.  

I'm wondering why his unroll shows 2 when he said he'd changed it to 15



Freds posts are always a little confusing.
But i get used to it.


Then I apollogize for this and try to be more specific.......

(Three GTX690 were reported, 1 690 and 1 680 was used, see a few posts back).

That would be my ATI host, using unroll 15,
ffa_block 10240 and ffa_block_fetch 5120 1 instance_per_device


I wondered why the rev.1363 LUNATICs app.(6.01) is shown as AstroPulse v6 v6.04 (opencl_ati_100)? And what went wrong?

In this AP WU
.

And you beat me to it ;-)


This shows 1316 not 1363 which is equal with 6.04 stock.

With each crime and every kindness we birth our future.
ID: 1271292 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : GPU AP performance tuning


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.