Best cmdline settings for OpenCL apps (how to find)?

Message boards : Number crunching : Best cmdline settings for OpenCL apps (how to find)?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130384 - Posted: 21 Jul 2011, 13:56:42 UTC
Last modified: 21 Jul 2011, 14:12:48 UTC

Hello community!


Since a long time there are Astropulse & S@h Enhanced (MultiBeam) OpenCL apps for ATI/AMD GPUs available (included in the Lunatics Installer (at least V0.38)).

Since a short time Raistmer opened the beta test of his Astropulse OpenCL app for nVIDIA GPUs.

I guess I'm not the only one who have no knowledge/experiences about the cmdline settings for this OpenCL apps. ;-)


For example for the Astropulse OpenCL app (r521 beta) for nVIDIA GPUs the following cmdline settings are possible:

-unroll N
-ffa_block N
-ffa_block_fetch N
-instances_per_device N
-hp
-no_cpu_lock


And now? ;-)


I got the hint that there are Astropulse bench tools available on the Lunatics site.

Someone know where are the related threads about this tools?

Someone know how to use this tools?


Maybe it's possible that someone with knowledge could make a very easy full automatically test tool (for unexperienced users) which will test all possible settings/combis (maybe also 1 or 2+ WUs/GPU) which could tell then the best cmdline settings for max performance?

If this test would last one or more days this would be O.K. for me.. ;-)
Then we would be sure to have the best cmdline settings - and the project get the max performance of our machines.


Thanks!


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130384 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1130403 - Posted: 21 Jul 2011, 15:13:49 UTC - in response to Message 1130384.  

Since a short time Raistmer opened the beta test of his Astropulse OpenCL app for nVIDIA GPUs.

I guess I'm not the only one who have no knowledge/experiences about the cmdline settings for this OpenCL apps. ;-)


Info is buried in various release notes, the readme to the installer and a few threads.

For example for the Astropulse OpenCL app (r521 beta) for nVIDIA GPUs the following cmdline settings are possible:

-unroll N
-ffa_block N
-ffa_block_fetch N
-instances_per_device N
-hp
-no_cpu_lock


And now? ;-)


Complain to Raistmer?
For the benefit of the community, quoting from Raistmer's release notes

-ffa_block - defines how many different periods GPU will process per single kernel call
-ffa_block_fetch - defines how many threads will be used in FFA initial fetch kernel
Rules for using these values:
-ffa_block_fetch <number> can be used only if -ffa_block <number> already listed in command line
numbers should be even,better if they will be power of 2, ffa_block should be divisible by ffa_block_fetch.
If you experience lags during application execution try to decrease these values.

[If you are running smoothly, you can try upping these values]

-hp - sets high priority class
-no_cpu_lock - disables affinity setting
-instances_per_device N - will allow running N copies per each supported GPU device (don't forget to set <count> field in app_info to 1/N to instruct BOINC to launch N tasks per GPU).

-unroll N -sets DATA_CHUNK_UNROLL variable to N. This allows to do N data chunks per FindSinglePulse kernel call improving (in most cases) performance but increasing GPU memory requirements. On low-end GPUs it may be worth to use lower values.

I got the hint that there are Astropulse bench tools available on the Lunatics site.


We are in the process of reorganising tools to make them available to a wider public.

Someone know where are the related threads about this tools?


yes. some two dozen people I'd guess.
Publicly available is the readme in the bench package, which tells you basically everything you need to know.

Someone know how to use this tools?


No, of course not. We have them there, because we find them pretty. No reason at all alpha testers and developers might want to test applications offline.

Maybe it's possible that someone with knowledge could make a very easy full automatically test tool (for unexperienced users) which will test all possible settings/combis (maybe also 1 or 2+ WUs/GPU) which could tell then the best cmdline settings for max performance?

If this test would last one or more days this would be O.K. for me.. ;-)
Then we would be sure to have the best cmdline settings - and the project get the max performance of our machines.


Are you kidding?
Ok, apart from the fact that you'd need somebody with the right skills, the time and the inclination to code this tool, mathematically you are talking about the absolute maximum in a coupled four parameter space. Even if the performance curve held only one maximum it would still take a large amount of calculations to find. It is much more likely that the performance curve is bumpy and then you are looking at something like Monte Carlo to find the best fit.
That isn't to say that some sort of rough approximation can't be done, but even that would take time to calculate [and the above mentioned programmer to do the tool].
Apart from that best parameters almost certainly vary over time (e.g. with system usage, temperatures, the phase of the moon and the sillyness of the person in front of the screen)

I'm afraid for the forseeable future you will have to manually optimise the parameters.

Bear in mind that for AP runtimes are dependant on blanking % and only tasks with similar blanking % should be compared when tuning parameters.
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1130403 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1130406 - Posted: 21 Jul 2011, 15:27:50 UTC


I also gave a couple infos in the beta thread how to set params.
You can always ask me or other testers are always willing to help.



With each crime and every kindness we birth our future.
ID: 1130406 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130413 - Posted: 21 Jul 2011, 15:37:50 UTC - in response to Message 1130403.  
Last modified: 21 Jul 2011, 15:53:18 UTC

Miep, why you are so negatively?

I asked very kind.

Noone is complaining here.

I read what this and this cmdline settings should do, but I don't understand it. I'm not a coder. I'm an unexperienced user.

Yes, I could adjust this value, adjust this value and then.. maybe it will run. But at the max performance? Noone know it.

I read the ReadMe's of the bench tools - I'm not smarter.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130413 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130424 - Posted: 21 Jul 2011, 15:53:03 UTC - in response to Message 1130413.  
Last modified: 21 Jul 2011, 16:07:42 UTC

Additional..

I as an unexperienced user.. and not a coder..

I thought it could be done like this..

An Astropulse bench WU with 'all in it', or a few bench WUs.
The script/tool choose the first cmdline settings.
Run of the test WU.
The script/tool choose other cmdline settings/combis.
Run of the test WU.
The script/tool choose other cmdline settings/combis.
Run of the test WU.
...
..
.
After an overview of cmdline settings and run times are displayed.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130424 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130434 - Posted: 21 Jul 2011, 16:07:26 UTC - in response to Message 1130406.  

I also gave a couple infos in the beta thread how to set params.
You can always ask me or other testers are always willing to help.


Which cmdline settings you would use with a GTX260-216 OC?

This would be the default:
<cmdline>-instances_per_device 1 -hp -no_cpu_lock -unroll 10 -ffa_block 8192 -ffa_block_fetch 2048</cmdline>

(I use -no_cpu_lock, because in past with the CUDA app with CPU lock the calcuation times increased)

I tested 2 WUs/GPU, but then each AP app use ~ 50 % of a CPU-Core of my Duo-CPU, so ~ 50 % of the whole CPU only for GPU support.
If I let run only 1 WU/GPU the usage of the CPU is very low like with CUDA.

So I should load the GTX260 so high as possible with one WU.
So I should use a very high -unroll value?

And which other settings/values?


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130434 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1130435 - Posted: 21 Jul 2011, 16:07:53 UTC - in response to Message 1130424.  

Additional..

I as an unexperienced user...

Why are you still unexperienced after four years and 38 million credits?
ID: 1130435 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130437 - Posted: 21 Jul 2011, 16:13:40 UTC - in response to Message 1130435.  

Additional..

I as an unexperienced user...

Why are you still unexperienced after four years and 38 million credits?


Because Astropulse OpenCL app for nVIDIA GPU is new, very new (17 Jul 2011 - 20:48:06 UTC).


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130437 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1130440 - Posted: 21 Jul 2011, 16:16:29 UTC - in response to Message 1130437.  

Additional..

I as an unexperienced user...

Why are you still unexperienced after four years and 38 million credits?

Because Astropulse OpenCL app for nVIDIA GPU is new, very new (17 Jul 2011 - 20:48:06 UTC).

In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution.
ID: 1130440 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130445 - Posted: 21 Jul 2011, 16:27:49 UTC - in response to Message 1130440.  
Last modified: 21 Jul 2011, 16:28:17 UTC

In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution.


AP OpenCL app for ATI/AMD is not new.

AFAIK, the AP OpenCL apps for ATI/AMD and nVIDIA are very similar.

Other people could have experiences.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130445 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1130454 - Posted: 21 Jul 2011, 16:50:15 UTC - in response to Message 1130434.  
Last modified: 21 Jul 2011, 16:51:15 UTC

I also gave a couple infos in the beta thread how to set params.
You can always ask me or other testers are always willing to help.


Which cmdline settings you would use with a GTX260-216 OC?

This would be the default:
<cmdline>-instances_per_device 1 -hp -no_cpu_lock -unroll 10 -ffa_block 8192 -ffa_block_fetch 2048</cmdline>

(I use -no_cpu_lock, because in past with the CUDA app with CPU lock the calcuation times increased)

I tested 2 WUs/GPU, but then each AP app use ~ 50 % of a CPU-Core of my Duo-CPU, so ~ 50 % of the whole CPU only for GPU support.
If I let run only 1 WU/GPU the usage of the CPU is very low like with CUDA.

So I should load the GTX260 so high as possible with one WU.
So I should use a very high -unroll value?

And which other settings/values?


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -


I would start with

<cmdline>-instances_per_device 1 -hp -no_cpu_lock -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096</cmdline>

How many CUs does your 260 have.

Note your hosts are hidden so i dont go any further to avoid problems.


With each crime and every kindness we birth our future.
ID: 1130454 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1130468 - Posted: 21 Jul 2011, 17:38:07 UTC - in response to Message 1130454.  
Last modified: 21 Jul 2011, 17:41:50 UTC

I would start with

<cmdline>-instances_per_device 1 -hp -no_cpu_lock -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096</cmdline>

How many CUs does your 260 have.
(...)


Ohh.. what are CUs?

The GTX260-216 55nm which I have - have 216 shader cores, this are 27 CUDA cores.

client_state.xml entry:
<coproc_cuda>
<count>1</count>
<name>GeForce GTX 260</name>
<drvVersion>27533</drvVersion>
<cudaVersion>4000</cudaVersion>
<totalGlobalMem>939327488</totalGlobalMem>
<sharedMemPerBlock>16384</sharedMemPerBlock>
<regsPerBlock>16384</regsPerBlock>
<warpSize>32</warpSize>
<memPitch>2147483647</memPitch>
<maxThreadsPerBlock>512</maxThreadsPerBlock>
<maxThreadsDim>512 512 64</maxThreadsDim>
<maxGridSize>65535 65535 1</maxGridSize>
<totalConstMem>65536</totalConstMem>
<major>1</major>
<minor>3</minor>
<clockRate>1500000</clockRate>
<textureAlignment>256</textureAlignment>
<deviceOverlap>1</deviceOverlap>
<multiProcessorCount>27</multiProcessorCount>
</coproc_cuda>



- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1130468 · Report as offensive
Profile Frizz
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 271
Credit: 5,852,934
RAC: 0
New Zealand
Message 1130472 - Posted: 21 Jul 2011, 18:00:35 UTC - in response to Message 1130403.  
Last modified: 21 Jul 2011, 18:00:57 UTC


Someone know how to use this tools?


No, of course not. We have them there, because we find them pretty. No reason at all alpha testers and developers might want to test applications offline.


LOL

OK, lets face it: We all have different skills in life. Not everyone is meant to be a tester. So not everyone needs to understand how those tools work.
ID: 1130472 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1130487 - Posted: 21 Jul 2011, 18:58:21 UTC

You can find the details of every user that runs asropulse By clicking on "Task" click for details to compare different computor settings that different people use.

For instance my geforce 250 http://setiathome.berkeley.edu/results.php?hostid=6031403&offset=0&show_names=0&state=3&appid=5

But it is somewhat hard to find all the tasks that are open cl ap based.....

I would also like to have a tool for the best setting....
Best thing would be if the program fine tuned itself for the best setting which i'd guess would be very hard programming....


//TQ
TRuEQ & TuVaLu
ID: 1130487 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1130530 - Posted: 21 Jul 2011, 20:25:42 UTC


Basic settings from installer should run on each setup.

Rest is fine tuning.

Just in case each config is little different with timings there is no optimal params.

What we can provide is help to get close.



With each crime and every kindness we birth our future.
ID: 1130530 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1130571 - Posted: 22 Jul 2011, 17:44:06 UTC - in response to Message 1130413.  
Last modified: 22 Jul 2011, 17:44:35 UTC

I read what this and this cmdline settings should do, but I don't understand it. I'm not a coder. I'm an unexperienced user.


Excuse me? If you are an inexperienced user why do you want to run beta and offline benchmarking tools?

Yes, I could adjust this value, adjust this value and then.. maybe it will run. But at the max performance? Noone know it.


I couldn't finish my post yesterday.
I had a nice graphic explanation of why it is difficult to find 'best' performance and why calculating it takes a lot of crunching power and is not really worth the effort.
A human running a few parameter sets and comparing results is much better at homing in on a 'good' performance setting.
Yes an automated script like you describe a pst or so later could work - just that choosing meaningful parameter sets is tricky for a machine and a brute force approach on 4 parameters is just not workable.
And you still need the coder I was talking about.

I read the ReadMe's of the bench tools - I'm not smarter.


You have a whole thread on benchmarking different MB apps.
AP works the same.

As for putting the parameters into the script, the newest approach will have an easily adaptable variable line (as Jason's 1.50 version of MB bench already has).
We have no timeline whatsoever on when the tools section will be up to date and complete.
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1130571 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1130582 - Posted: 22 Jul 2011, 18:02:09 UTC - in response to Message 1130487.  

...
I would also like to have a tool for the best setting....
Best thing would be if the program fine tuned itself for the best setting which i'd guess would be very hard programming....

//TQ

Some approximation of the best settings could probably be done with a modified script, but to do a full genetic algorithm search for the best set would probably take a long time. More to the point, such testing doesn't match the conditions while doing real crunching for the project. For now, the best tool is the human mind's capability for pattern recognition, even though it often imagines patterns where none exist.
                                                                 Joe
ID: 1130582 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1130610 - Posted: 22 Jul 2011, 18:53:48 UTC - in response to Message 1130445.  

In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution.


AP OpenCL app for ATI/AMD is not new.

AFAIK, the AP OpenCL apps for ATI/AMD and nVIDIA are very similar.

Other people could have experiences.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -


Astropulse for ATI/AMD is not OpenCL, it is Brook. Completely different animal.

ID: 1130610 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1130616 - Posted: 22 Jul 2011, 18:58:21 UTC - in response to Message 1130610.  

In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution.


AP OpenCL app for ATI/AMD is not new.

AFAIK, the AP OpenCL apps for ATI/AMD and nVIDIA are very similar.

Other people could have experiences.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -


Astropulse for ATI/AMD is not OpenCL, it is Brook. Completely different animal.

I thought the newer ATI AP app for the 4800+ cards was Open CL like the MB app & the Hybrid ATI AP was brook
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1130616 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1130672 - Posted: 22 Jul 2011, 21:16:04 UTC

Sure it is OpenCL.




With each crime and every kindness we birth our future.
ID: 1130672 · Report as offensive

Message boards : Number crunching : Best cmdline settings for OpenCL apps (how to find)?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.