ap_cmdline_6.04_windows_intelx86_opencl

Author	Message
Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1542350 - Posted: 16 Jul 2014, 1:23:47 UTC Last modified: 16 Jul 2014, 1:30:53 UTC This file started showing up when I got the odd Astropulse file to process. I looked high, I looked low. It took a google search of the "number crunching" message area to find stuff. Here is some consolidated possibly useful information. This goes in that file for a "660": -hp -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 --Explanation from a lunatics text file that doesn't show up on the stock installs--------------------------------------------------------------------- ap_cmdline_6.04_windows_intelx86_opencl_nvidia.txt is the file to change AstroPulse OpenCL application currently available in 3 editions: for AMD/ATi, nVidia and Intel GPUs. It's intended to process SETI@home AstroPulse v6 tasks. Source code repository: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt Build revision:2083 Date of revision commit: 2013/11/22 02:46:12 **Available command line switches -v N :sets level of verbosity of app. N - integer number. -ffa_block N :sets how many FFA's different period iterations will be processed per kernel call. N should be integer even number less than 32768. -ffa_block_fetch N: sets how many FFA's different period iterations will be processed per "fetch" kernel call (longest kernel in FFA). N should be positive integer number, should be divisor of ffa_block_N. -unroll N :sets number of data chunks processed per kernel call in main application loop. N should be integer number, minimal possible value is 2. -skip_ffa_precompute : Results in skipping FFA pre-compute kernel call. Affects performance. Experimentation required if it will increase or decrease performance on particular GPU/CPU combo. -exit_check :Results in more often check for exit requests from BOINC. If you experience problems with long app suspend/exit use this option. Can decrease performance though. -use_sleep :Results in additional Sleep() calls to yield CPU to other processes. Can affect performance. Experimentation required. -initial_ffa_sleep N M: In PC-FFA will sleep N ms for short and M ms for large one before looking for results. Can decrease CPU usage. Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N and M should be integer non-negative numbers. Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file. -initial_single_pulse_sleep N : In SingleFind search will sleep N ms before looking for results. Can decrease CPU usage. Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N should be integer positive number. Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file. -sbs N :Sets maximum single buffer size for GPU memory allocations. N should be positive integer and means bigger size in Mbytes. For now if other options require bigger buffer than this option allows warning will be issued but memory allocation attempt will be made. -hp : Results in bigger priority for application process (normal priority class and above normal thread priority). Can be used to increase GPU load, experimentation required for particular GPU/CPU/GPU driver combo. -cpu_lock : Enables CPUlock feature. Results in CPUs number limitation for particular app instance. Also attempt to bind different instances to different CPU cores will be made. Can be used to increase performance under some specific conditions. Can decrease performance in other cases though. Experimentation required. Now this option allows GPU app to use only single logical CPU. Different instances will use different CPUs as long as there is enough of CPU in the system. To use CPUlock in round-robin mode GPUlock feature will be enabled. Use -instances_per_device N option if few instances per GPU device are needed. -cpu_lock_fixed_cpu N : Will enable CPUlock too but will bind all app instances to the same N-th CPU (N=0,1,.., number of CPUs-1). -tune N Mx My Mz : to make app more tunable this param allows user to fine tune kernel launch sizes of most important kernels. N - kernel ID (see below) Mxyz - workgroup size of kernel. For 1D workgroups Mx will be size of first dimension and My=Mz=1 should be 2 other ones. N should be one of values from this list: FFA_FETCH_WG=1, FFA_COMPARE_WG=2 For best tuning results its recommended to launch app under profiler to see how particular WG size choice affects particular kernel. This option mostly for developers and hardcore optimization enthusiasts wanting absolute max from their setups. No big changes in speed expected but if you see big positive change over default please report. Usage example: -tune 2 32 1 1 (set workgroup size of 32 for 1D FFA comparison kernel). Here some already obsolete options are listed. They are not tested for proper operation with latest builds and are only listed for completeness: -gpu_lock :Old way GPU lock enabled. Use -instances_per_device N switch to provide number of instances to run. -instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances). N - integer number of allowed instances. -disable_slot N: Can be used to exclude N-th GPU (starting from zero) from usage. Not tested and obsolete feature, use BOINC abilities to exclude GPUs instead. These 2 options used together provide BOINC-independent way to limit number of simultaneously executing GPU apps. Each SETI OpenCL GPU application with these switches enabled will create/check global Mutexes and suspend its process execution if limit is reached. Awaiting process will consume zero CPU/GPU and rather low amount of memory awaiting when it can continue execution. These switches can be placed into the file called ap_cmdline.txt also. For examples of app_info.xml entries look into text file with .aistub extension provided in corresponding package. Known issues** - With 12.x Catalyst drivers GPU usage can be low if CPU fully used with another loads. Same applies to NV drivers past 267.xx and to Intel SDK drivers. If you see low GPU usage of zero blanked tasks try to free one or more CPU cores. * - For overflowed tasks found signal sequence not always match CPU version. - If you experience problems with time to completion estimations from BOINC you could try this advice by Terror Australis (http://setiathome.berkeley.edu/forum_thread.php?id=71301&postid=1354911): for Astropulse the flops entry sometimes has to be in scientific notation format for BOINC to understand it right. I.e XXXXe0x, where x is the number of zeros after the integer eg, 9 for Gigaflops, 8 for 100's of Megaflops etc. Thus the entry for GTX470 (1120GF)is... <flops>1120e09</flops> For a GTX550Ti (486GF) it would be <flops>486e09</flops> ----------------------------------------------------------------------- I can confirm is shows up on two different Nvidia-based machines I have. I don't have an Intel gpu so can't tell on that. The interesting thing is it does NOT show up on my current Radeon based machine (netbook) that is currently processing 2 Astropulse data files. And taking more than a 100 hours so far... I read that Astropulse is given something like a 3 week window before it is over due. Hope it doesn't take quite that long on the netbook. It certainly didn't on my Xeon. HTH Tom A proud member of the OFA (Old Farts Association). ID: 1542350 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1542395 - Posted: 16 Jul 2014, 3:33:10 UTC So now that I have found this information I am looking for someone to help me apply it to a 2 core Intel running 3ghz with 4GB of memory and a GT210 gpu. It is unlikely that gpu will change because the case is a "slimline" which means half height or nothing. Anyway, this machine is currently my 2nd highest total credit earner. And it has finally started getting Astropulse data to process. So apparently a GTX660 works with this: -hp -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 this might actually be a pretty good general guide line. Any ideas? Thanks, Tom A proud member of the OFA (Old Farts Association). ID: 1542395 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1542504 - Posted: 16 Jul 2014, 8:40:05 UTC - in response to Message 1542395. Last modified: 16 Jul 2014, 8:52:50 UTC Any ideas? For low-end GPUs (APU, GT210) - do not tweak, use defaults Run one task/GPU with default settings What you see posted for GTX660 will not work on GT210 (what works for Ferrari will hardly work on Oldsmobile) P.S. No need to search or post the docs - all are in Lunatics' Installers * Get the Installer: http://setiathome.berkeley.edu/forum_thread.php?id=71867&postid=1375943#1375943 http://mikesworldnet.de/download.html * Get 7-Zip http://www.7-zip.org/ Use 7-Zip to unpack all in the Installer or only the docs: ...\Lunatics_Win32_v0.41_setup.exe\$_OUTDIR\docs\ Newer versions of the docs are found in files like (from mikesworldnet.de): MB7_win_x86_SSE_OpenCL_ATi_HD5_r2170.7z AP6_win_x86_SSE2_OpenCL_ATI_r2180.7z Â Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1542504 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1542565 - Posted: 16 Jul 2014, 13:21:46 UTC Thank you for both the advice on what parms to set and for reminding me that there is a way to get doc files out of the installer without running the installer! A proud member of the OFA (Old Farts Association). ID: 1542565 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.

ap_cmdline_6.04_windows_intelx86_opencl_nvidia.txt