Message boards :
Number crunching :
Best cmdline settings for OpenCL apps (how to find)?
Message board moderation
Author | Message |
---|---|
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Hello community! Since a long time there are Astropulse & S@h Enhanced (MultiBeam) OpenCL apps for ATI/AMD GPUs available (included in the Lunatics Installer (at least V0.38)). Since a short time Raistmer opened the beta test of his Astropulse OpenCL app for nVIDIA GPUs. I guess I'm not the only one who have no knowledge/experiences about the cmdline settings for this OpenCL apps. ;-) For example for the Astropulse OpenCL app (r521 beta) for nVIDIA GPUs the following cmdline settings are possible: -unroll N -ffa_block N -ffa_block_fetch N -instances_per_device N -hp -no_cpu_lock And now? ;-) I got the hint that there are Astropulse bench tools available on the Lunatics site. Someone know where are the related threads about this tools? Someone know how to use this tools? Maybe it's possible that someone with knowledge could make a very easy full automatically test tool (for unexperienced users) which will test all possible settings/combis (maybe also 1 or 2+ WUs/GPU) which could tell then the best cmdline settings for max performance? If this test would last one or more days this would be O.K. for me.. ;-) Then we would be sure to have the best cmdline settings - and the project get the max performance of our machines. Thanks! - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Miep Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0 |
Since a short time Raistmer opened the beta test of his Astropulse OpenCL app for nVIDIA GPUs. Info is buried in various release notes, the readme to the installer and a few threads. For example for the Astropulse OpenCL app (r521 beta) for nVIDIA GPUs the following cmdline settings are possible: Complain to Raistmer? For the benefit of the community, quoting from Raistmer's release notes -ffa_block - defines how many different periods GPU will process per single kernel call -ffa_block_fetch - defines how many threads will be used in FFA initial fetch kernel Rules for using these values: -ffa_block_fetch <number> can be used only if -ffa_block <number> already listed in command line numbers should be even,better if they will be power of 2, ffa_block should be divisible by ffa_block_fetch. If you experience lags during application execution try to decrease these values. [If you are running smoothly, you can try upping these values] -hp - sets high priority class -no_cpu_lock - disables affinity setting -instances_per_device N - will allow running N copies per each supported GPU device (don't forget to set <count> field in app_info to 1/N to instruct BOINC to launch N tasks per GPU). -unroll N -sets DATA_CHUNK_UNROLL variable to N. This allows to do N data chunks per FindSinglePulse kernel call improving (in most cases) performance but increasing GPU memory requirements. On low-end GPUs it may be worth to use lower values. I got the hint that there are Astropulse bench tools available on the Lunatics site. We are in the process of reorganising tools to make them available to a wider public. Someone know where are the related threads about this tools? yes. some two dozen people I'd guess. Publicly available is the readme in the bench package, which tells you basically everything you need to know. Someone know how to use this tools? No, of course not. We have them there, because we find them pretty. No reason at all alpha testers and developers might want to test applications offline. Maybe it's possible that someone with knowledge could make a very easy full automatically test tool (for unexperienced users) which will test all possible settings/combis (maybe also 1 or 2+ WUs/GPU) which could tell then the best cmdline settings for max performance? Are you kidding? Ok, apart from the fact that you'd need somebody with the right skills, the time and the inclination to code this tool, mathematically you are talking about the absolute maximum in a coupled four parameter space. Even if the performance curve held only one maximum it would still take a large amount of calculations to find. It is much more likely that the performance curve is bumpy and then you are looking at something like Monte Carlo to find the best fit. That isn't to say that some sort of rough approximation can't be done, but even that would take time to calculate [and the above mentioned programmer to do the tool]. Apart from that best parameters almost certainly vary over time (e.g. with system usage, temperatures, the phase of the moon and the sillyness of the person in front of the screen) I'm afraid for the forseeable future you will have to manually optimise the parameters. Bear in mind that for AP runtimes are dependant on blanking % and only tasks with similar blanking % should be compared when tuning parameters. Carola ------- I'm multilingual - I can misunderstand people in several languages! |
Mike Send message Joined: 17 Feb 01 Posts: 34253 Credit: 79,922,639 RAC: 80 |
I also gave a couple infos in the beta thread how to set params. You can always ask me or other testers are always willing to help. With each crime and every kindness we birth our future. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Miep, why you are so negatively? I asked very kind. Noone is complaining here. I read what this and this cmdline settings should do, but I don't understand it. I'm not a coder. I'm an unexperienced user. Yes, I could adjust this value, adjust this value and then.. maybe it will run. But at the max performance? Noone know it. I read the ReadMe's of the bench tools - I'm not smarter. - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Additional.. I as an unexperienced user.. and not a coder.. I thought it could be done like this.. An Astropulse bench WU with 'all in it', or a few bench WUs. The script/tool choose the first cmdline settings. Run of the test WU. The script/tool choose other cmdline settings/combis. Run of the test WU. The script/tool choose other cmdline settings/combis. Run of the test WU. ... .. . After an overview of cmdline settings and run times are displayed. - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I also gave a couple infos in the beta thread how to set params. Which cmdline settings you would use with a GTX260-216 OC? This would be the default: <cmdline>-instances_per_device 1 -hp -no_cpu_lock -unroll 10 -ffa_block 8192 -ffa_block_fetch 2048</cmdline> (I use -no_cpu_lock, because in past with the CUDA app with CPU lock the calcuation times increased) I tested 2 WUs/GPU, but then each AP app use ~ 50 % of a CPU-Core of my Duo-CPU, so ~ 50 % of the whole CPU only for GPU support. If I let run only 1 WU/GPU the usage of the CPU is very low like with CUDA. So I should load the GTX260 so high as possible with one WU. So I should use a very high -unroll value? And which other settings/values? - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Additional.. Why are you still unexperienced after four years and 38 million credits? |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Additional.. Because Astropulse OpenCL app for nVIDIA GPU is new, very new (17 Jul 2011 - 20:48:06 UTC). - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Additional.. In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution. AP OpenCL app for ATI/AMD is not new. AFAIK, the AP OpenCL apps for ATI/AMD and nVIDIA are very similar. Other people could have experiences. - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Mike Send message Joined: 17 Feb 01 Posts: 34253 Credit: 79,922,639 RAC: 80 |
I also gave a couple infos in the beta thread how to set params. I would start with <cmdline>-instances_per_device 1 -hp -no_cpu_lock -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096</cmdline> How many CUs does your 260 have. Note your hosts are hidden so i dont go any further to avoid problems. With each crime and every kindness we birth our future. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I would start with Ohh.. what are CUs? The GTX260-216 55nm which I have - have 216 shader cores, this are 27 CUDA cores. client_state.xml entry: <coproc_cuda> <count>1</count> <name>GeForce GTX 260</name> <drvVersion>27533</drvVersion> <cudaVersion>4000</cudaVersion> <totalGlobalMem>939327488</totalGlobalMem> <sharedMemPerBlock>16384</sharedMemPerBlock> <regsPerBlock>16384</regsPerBlock> <warpSize>32</warpSize> <memPitch>2147483647</memPitch> <maxThreadsPerBlock>512</maxThreadsPerBlock> <maxThreadsDim>512 512 64</maxThreadsDim> <maxGridSize>65535 65535 1</maxGridSize> <totalConstMem>65536</totalConstMem> <major>1</major> <minor>3</minor> <clockRate>1500000</clockRate> <textureAlignment>256</textureAlignment> <deviceOverlap>1</deviceOverlap> <multiProcessorCount>27</multiProcessorCount> </coproc_cuda> - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Frizz Send message Joined: 17 May 99 Posts: 271 Credit: 5,852,934 RAC: 0 |
LOL OK, lets face it: We all have different skills in life. Not everyone is meant to be a tester. So not everyone needs to understand how those tools work. |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
You can find the details of every user that runs asropulse By clicking on "Task" click for details to compare different computor settings that different people use. For instance my geforce 250 http://setiathome.berkeley.edu/results.php?hostid=6031403&offset=0&show_names=0&state=3&appid=5 But it is somewhat hard to find all the tasks that are open cl ap based..... I would also like to have a tool for the best setting.... Best thing would be if the program fine tuned itself for the best setting which i'd guess would be very hard programming.... //TQ TRuEQ & TuVaLu |
Mike Send message Joined: 17 Feb 01 Posts: 34253 Credit: 79,922,639 RAC: 80 |
Basic settings from installer should run on each setup. Rest is fine tuning. Just in case each config is little different with timings there is no optimal params. What we can provide is help to get close. With each crime and every kindness we birth our future. |
Miep Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0 |
I read what this and this cmdline settings should do, but I don't understand it. I'm not a coder. I'm an unexperienced user. Excuse me? If you are an inexperienced user why do you want to run beta and offline benchmarking tools? Yes, I could adjust this value, adjust this value and then.. maybe it will run. But at the max performance? Noone know it. I couldn't finish my post yesterday. I had a nice graphic explanation of why it is difficult to find 'best' performance and why calculating it takes a lot of crunching power and is not really worth the effort. A human running a few parameter sets and comparing results is much better at homing in on a 'good' performance setting. Yes an automated script like you describe a pst or so later could work - just that choosing meaningful parameter sets is tricky for a machine and a brute force approach on 4 parameters is just not workable. And you still need the coder I was talking about. I read the ReadMe's of the bench tools - I'm not smarter. You have a whole thread on benchmarking different MB apps. AP works the same. As for putting the parameters into the script, the newest approach will have an easily adaptable variable line (as Jason's 1.50 version of MB bench already has). We have no timeline whatsoever on when the tools section will be up to date and complete. Carola ------- I'm multilingual - I can misunderstand people in several languages! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... Some approximation of the best settings could probably be done with a modified script, but to do a full genetic algorithm search for the best set would probably take a long time. More to the point, such testing doesn't match the conditions while doing real crunching for the project. For now, the best tool is the human mind's capability for pattern recognition, even though it often imagines patterns where none exist. Joe |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution. Astropulse for ATI/AMD is not OpenCL, it is Brook. Completely different animal. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
In which case, you're starting from exactly the same position as everyone else, and you can contribute to finding the solution. I thought the newer ATI AP app for the 4800+ cards was Open CL like the MB app & the Hybrid ATI AP was brook SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Mike Send message Joined: 17 Feb 01 Posts: 34253 Credit: 79,922,639 RAC: 80 |
Sure it is OpenCL. With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.