How do I run GPU high priority and other questions.

Message boards : Number crunching : How do I run GPU high priority and other questions.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9764
Credit: 572,710,851
RAC: 8,616
Panama
Message 1469764 - Posted: 28 Jan 2014, 0:37:49 UTC - in response to Message 1469758.  

By the way, your "fly" signature gets me every time

LOL - That "fly" we could literaly calls a bug!
ID: 1469764 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1469763 - Posted: 28 Jan 2014, 0:35:59 UTC

I will be installing and looking at "lunatics" soon on another test machine. Things are running the best they have ever run on my cruncher, I have finally flat lined and want to take one baby step at a time.
ID: 1469763 · Report as offensive
Profile Yanivicious
Avatar

Send message
Joined: 29 Mar 12
Posts: 157
Credit: 15,529,301
RAC: 0
United States
Message 1469758 - Posted: 28 Jan 2014, 0:30:18 UTC - in response to Message 1469747.  

just curious @batterup why you don't want to use the lunatics installer - i also do not know anything about coding but the installer for the optimized app also creates an "app_info" file which you can easily edit to control the number of WU's on each card in addition to being able to change GPU priority (I think the "avgcpus" lines of code, found above the "<count>x</count>" sections is responsible for that - I haven't edited the priority myself though).
You are working with some amazing equipment and lunatics definitely increases your RAC a bit.
By the way, your "fly" signature gets me every time

i would grab the .41 installer from http://lunatics.kwsn.net/ and just watch your CPU/GPU temps and utilization if you are worried it is going to over-stress your box
ID: 1469758 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1469752 - Posted: 28 Jan 2014, 0:08:36 UTC - in response to Message 1469747.  

When I add -hp do I save it as txt or cfg?

Keep it as .txt
ID: 1469752 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1469747 - Posted: 27 Jan 2014, 23:55:15 UTC - in response to Message 1469741.  


I think you actually have to download the Lunatics installer to get that file. Stock users don't normally have it.

I don't have that file; I do have ap_cmdline_6.04_windows_intelx86_opencl_nvidia.txt
When I add -hp do I save it as txt or cfg?
ID: 1469747 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9764
Credit: 572,710,851
RAC: 8,616
Panama
Message 1469745 - Posted: 27 Jan 2014, 23:53:45 UTC
Last modified: 28 Jan 2014, 0:05:22 UTC

Then this is the contens of the read.me file for r2083: hope that helps him and others, but please note for now, some parameters like -use.sleep are not avaiable in stock apps just on beta.

AstroPulse OpenCL application currently available in 3 editions: for AMD/ATi, nVidia and Intel GPUs.
It's intended to process SETI@home AstroPulse v6 tasks.

Source code repository: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt
Build revision:2083
Date of revision commit: 2013/11/22 02:46:12

****Available command line switches****

-v N :sets level of verbosity of app. N - integer number.

-ffa_block N :sets how many FFA's different period iterations will be processed per kernel call. N should be integer even number less than 32768.

-ffa_block_fetch N: sets how many FFA's different period iterations will be processed per "fetch" kernel call (longest kernel in FFA).
N should be positive integer number, should be divisor of ffa_block_N.

-unroll N :sets number of data chunks processed per kernel call in main application loop. N should be integer number, minimal possible value is 2.

-skip_ffa_precompute : Results in skipping FFA pre-compute kernel call. Affects performance. Experimentation required if it will increase or decrease performance on particular GPU/CPU combo.

-exit_check :Results in more often check for exit requests from BOINC. If you experience problems with long app suspend/exit use this option.
Can decrease performance though.

-use_sleep :Results in additional Sleep() calls to yield CPU to other processes. Can affect performance. Experimentation required.

-initial_ffa_sleep N M: In PC-FFA will sleep N ms for short and M ms for large one before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N and M should be integer non-negative numbers.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.

-initial_single_pulse_sleep N : In SingleFind search will sleep N ms before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N should be integer positive number.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.

-sbs N :Sets maximum single buffer size for GPU memory allocations. N should be positive integer and means bigger size in Mbytes.
For now if other options require bigger buffer than this option allows warning will be issued but memory allocation attempt will be made.

-hp : Results in bigger priority for application process (normal priority class and above normal thread priority).
Can be used to increase GPU load, experimentation required for particular GPU/CPU/GPU driver combo.

-cpu_lock : Enables CPUlock feature. Results in CPUs number limitation for particular app instance. Also attempt to bind different instances to different CPU cores will be made.
Can be used to increase performance under some specific conditions. Can decrease performance in other cases though. Experimentation required.
Now this option allows GPU app to use only single logical CPU.
Different instances will use different CPUs as long as there is enough of CPU in the system.
To use CPUlock in round-robin mode GPUlock feature will be enabled. Use -instances_per_device N option if few instances per GPU device are needed.

-cpu_lock_fixed_cpu N : Will enable CPUlock too but will bind all app instances to the same N-th CPU (N=0,1,.., number of CPUs-1).

-tune N Mx My Mz : to make app more tunable this param allows user to fine tune kernel launch sizes of most important kernels.
N - kernel ID (see below)
Mxyz - workgroup size of kernel. For 1D workgroups Mx will be size of first dimension and My=Mz=1 should be 2 other ones.
N should be one of values from this list:
FFA_FETCH_WG=1,
FFA_COMPARE_WG=2
For best tuning results its recommended to launch app under profiler to see how particular WG size choice affects particular kernel.
This option mostly for developers and hardcore optimization enthusiasts wanting absolute max from their setups.
No big changes in speed expected but if you see big positive change over default please report.
Usage example: -tune 2 32 1 1 (set workgroup size of 32 for 1D FFA comparison kernel).

Here some already obsolete options are listed. They are not tested for proper operation with latest builds and are only listed for completeness:

-gpu_lock :Old way GPU lock enabled. Use -instances_per_device N switch to provide number of instances to run.

-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).
N - integer number of allowed instances.

-disable_slot N: Can be used to exclude N-th GPU (starting from zero) from usage.
Not tested and obsolete feature, use BOINC abilities to exclude GPUs instead.

These 2 options used together provide BOINC-independent way to limit number of simultaneously
executing GPU apps. Each SETI OpenCL GPU application with these switches enabled will create/check global Mutexes and suspend its process
execution if limit is reached. Awaiting process will consume zero CPU/GPU and rather low amount of memory awaiting when it can continue execution.

These switches can be placed into the file called ap_cmdline.txt also.

For examples of app_info.xml entries look into text file with .aistub extension provided in corresponding package.

****Known issues****
- With 12.x Catalyst drivers GPU usage can be low if CPU fully used with another loads.
Same applies to NV drivers past 267.xx and to Intel SDK drivers.
If you see low GPU usage of zero blanked tasks try to free one or more CPU cores. *
- For overflowed tasks found signal sequence not always match CPU version.
- If you experience problems with time to completion estimations from BOINC you could try this advice by Terror Australis
(http://setiathome.berkeley.edu/forum_thread.php?id=71301&postid=1354911):
for Astropulse the flops entry sometimes has to be in scientific notation format for BOINC to understand it right.
I.e XXXXe0x, where x is the number of zeros after the integer eg, 9 for Gigaflops, 8 for 100's of Megaflops etc.

Thus the entry for GTX470 (1120GF)is...
<flops>1120e09</flops>

For a GTX550Ti (486GF) it would be
<flops>486e09</flops>

for a GTX 580 (1679GF)
<flops>1679e09</flops>

For a low powered card such as a GTS250 (756MF IIRC) (not recommended for AP work.)
the entry would be something like
<flops>756e08</flops>

and so on.

The flops value can be found at the top of the BOINC messages tab where the boot up details are.

****Best usage tips****

For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*

*: As alternate solution try to use -cpu_lock / -cpu_lock_fixed_cpu N options.

command line parameters.
_______________________

High end cards (more than 12 compute units)

-unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp

Mid range cards (less than 12 compute units)

-unroll 10 -ffa_block 6144 -ffa_block_fetch 1536 -hp

entry level GPU (less than 6 compute units)

-unroll 4 -ffa_block 2048 -ffa_block_fetch 1024 -hp

Your mileage might vary.
-----------------------------------------------------

App instances.
______________

On high end cards HD 5850/5870, 6950/6970, 7950/7970 you can run 3 instances.

On mid range cards HD 5770, 6850/6870, 7850/7870 best performance should be running 2 instances.

If you experience screen lags reduce unroll factor and ffa_block_fetch value.

Addendum:
_________

Running multiple cards in a system requires freeing another CPU core.


ID: 1469745 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1469741 - Posted: 27 Jan 2014, 23:44:45 UTC - in response to Message 1469739.  

My sugestion to him is try to find the AP read.me file, there he will find all the avaiable options an instructions for each parameters so he could be choose what will be better to him.

I think you actually have to download the Lunatics installer to get that file. Stock users don't normally have it.
ID: 1469741 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9764
Credit: 572,710,851
RAC: 8,616
Panama
Message 1469739 - Posted: 27 Jan 2014, 23:37:33 UTC - in response to Message 1469738.  
Last modified: 27 Jan 2014, 23:41:28 UTC

My sugestion to him is try to find the AP read.me file, there he will find all the avaiable options an instructions for each parameters so he could be choose what will be better to him.

Adding to Jeff post to guide you, as sugested to me by Mike on my 690 host i use:

-hp -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144

Just copy and paste to the file mentioned by Jeff.

Works perfect here, that will give you an start, After you confirm it´s working you could modify the parameters to get an even better optimization.
ID: 1469739 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1469738 - Posted: 27 Jan 2014, 23:33:41 UTC - in response to Message 1469730.  

[quote]
Was you able to locate the AP file?

Yes but it is empty,0kb

Since you're running stock and not Lunatics, I don't think you'll actually find the file that Juan mentioned. You want a file named ap_cmdline_6.04_windows_intelx86_opencl_nvidia.txt. It should be empty, but just edit it in Notepad or other text editor and create a single line with just '-hp' (without the quotes). As with the mbcuda.cfg change, there are differing opinions as to whether this makes much difference or not. I'll let the experts debate that point!
ID: 1469738 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1469730 - Posted: 27 Jan 2014, 23:07:36 UTC - in response to Message 1469728.  

[quote]
Was you able to locate the AP file?

Yes but it is empty,0kb
ID: 1469730 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9764
Credit: 572,710,851
RAC: 8,616
Panama
Message 1469728 - Posted: 27 Jan 2014, 23:06:31 UTC - in response to Message 1469705.  

I crunch 12 CPU MB WU along with 16 GPU MB WU. My GPU usage and temps did not change. Remember this is a crunching only test machine.

Sorry my mistake, i just see on your cruched WU and don´t see any CPU work.

Was you able to locate the AP file?
ID: 1469728 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1469705 - Posted: 27 Jan 2014, 22:15:55 UTC - in response to Message 1469698.  
Last modified: 27 Jan 2014, 23:06:31 UTC


You have a top CPU and aparently you don´t do any CPU work, so maybe that´s why you not experience any lag. Did you check your GPU usage and temp on each GPU?

I crunch 12 CPU MB WU along with 16 GPU MB WU. My GPU usage and temps did not change. Remember this is a crunching only test machine.

High end used cares are now being grabbed by coin miners. I think they gave up on waiting for the promised mining application specific chips.

AstroPulse_OpenCL_NV_ReadMe.txt is empty??
ID: 1469705 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9764
Credit: 572,710,851
RAC: 8,616
Panama
Message 1469698 - Posted: 27 Jan 2014, 22:05:33 UTC
Last modified: 27 Jan 2014, 22:13:49 UTC

Look in the file: AstroPulse_OpenCL_NV_ReadMe.txt the usage of -hp

You could see the priority with taskmgr.

You have a top CPU and aparently you don´t do any CPU work, so maybe that´s why you not experience any lag. Did you check your GPU usage and temp on each GPU?

BTW if you know about some more cheap 690 (used) please tell me, i could be interested to buy few more.
ID: 1469698 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1469695 - Posted: 27 Jan 2014, 22:02:12 UTC

Thank all of you, I am now crunching MB GPU WU with high priority. The only way I know is Stderr output says so. I can play a HD video with SETI running 16 WU no problem too. SETI is not asking enough from these cards.

How do I do the same with AP WU? Thank you in advance.
ID: 1469695 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 8
United Kingdom
Message 1469378 - Posted: 27 Jan 2014, 3:40:10 UTC - in response to Message 1469360.  

Then close the file and the next MB WU started will run in the new priority. To make it work instantaneusly your need to compleately stop boinc (not just the manager) and restart.

You don't need to restart Boinc to get the apps to use the new priority level, just suspend GPU usage via the activity menu for a moment.

Claggy
ID: 1469378 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9764
Credit: 572,710,851
RAC: 8,616
Panama
Message 1469377 - Posted: 27 Jan 2014, 3:30:58 UTC - in response to Message 1469375.  
Last modified: 27 Jan 2014, 3:43:38 UTC

The use of this switch will not meltdown your CPU don´t worry. If you find the files just take out the ; and it will work.

The major problem when you use high priority is your video response time (some slow hosts like mine presents a noticeable video lag) or if you run other jobs in your computers.

In simply language some programs not like others runing on high priority and almost stop when that happening. Thats make no diference on a crunching only hosts but in a general host is better leave without changes.

You could expect a marginal gain of 1-2% on the performace of the hosts if you set the priority to abovenormal or high. Of course YMMV.

<edit> fast GPU´s like the 690 specialy on multiGPU host like the one you have like the abovenormal and even high setting.
ID: 1469377 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1469375 - Posted: 27 Jan 2014, 3:25:32 UTC
Last modified: 27 Jan 2014, 3:27:13 UTC

OK, thank all of you for the replays. I found the files for MB and the sample file; I can see this is not going to be as easy as just changing 1 to .5 to run two WU per GPU. Will I gain enough on a dedicated cruncher going from below normal to above to risk a Cheronobel like core meltdown?
ID: 1469375 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4436
Credit: 55,006,323
RAC: 0
United States
Message 1469373 - Posted: 27 Jan 2014, 3:19:23 UTC - in response to Message 1469369.  

.
LEt´s see if i could help you with my bad english.

One at a time. Since there is no AP WU avaiable let start with MB (I asume you use the lunatics apps)

One must never ass/u/me. I use the standard apps and wish to continue using them. Your English is way better than my Portuguese. The only Portuguese I know is the pet name the Brazilian women in the Ironbound section of Newark call me; Safado.


You will still have the mbcuda.cfg file, but it will be empty.

;;; This configuration file is for optional control of Cuda Multibeam x41zc
;;; Currently, the available options are for
;;; application process priority control (without external tools), and
;;; per gpu priority control (useful for multiple Cuda GPU systems)
[mbcuda]
;;;;; Global applications settings, to apply to all Cuda devices
;;; You can uncomment the processpriority line below, by removing the ';', to engage machine global priority control of x41x
;;; possible options are 'belownormal' (which is the default), 'normal', 'abovenormal', or 'high'
;;; For dedicated crunching machines, 'abovenormal' is recommended
;;; raising global application priorities above the default
;;;   may have system dependant usability effects, and can have positive or negative effects on overall throughput 
processpriority = high
;;; Pulsefinding: Advanced options for long pulsefinds (affect display usability & long kernel runs)
;;; defaults are conservative.
;;; WARNING: Excessive values may induce display lag, driver timeout & recovery, or errors. 
;;; pulsefinding blocks per multiprocessor (1-16), default is 1 for Pre-Fermi, 4 for Fermi or newer GPUs
pfblockspersm = 15
;;; pulsefinding maximum periods per kernel launch  (1-1000), default is 100, as per 6.09
pfperiodsperlaunch = 200


That is what I run on my 670 & 660

ID: 1469373 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1469372 - Posted: 27 Jan 2014, 3:11:30 UTC
Last modified: 27 Jan 2014, 3:12:53 UTC

I think with the stock versions of the cuda apps, you may not have mbcuda.cfg but have separate files for cuda 42, 50, etc. Don't remember the filenames but they include mbcuda and the 42, 50, etc. These can be edited with notepad.

Edit: Jeff posted first and has the info on file names.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1469372 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1469371 - Posted: 27 Jan 2014, 3:09:55 UTC - in response to Message 1469369.  
Last modified: 27 Jan 2014, 3:12:11 UTC

I run mostly stock apps and the approach is pretty close to what Juan suggests. However, instead of mbcuda.cfg, you'll find separate files in your project directory for each Cuda flavor. So look for mbcuda-7.00-cuda50.cfg, etc. They're probably empty. There's a sample file for each one. Just pick one, change the first
;processpriority = abovenormal
to
processpriority = abovenormal
(just remove the semicolon). Save that file over each flavor of Cuda that you normally receive (probably just cuda50, cuda42, and cuda32). From that point on each GPU task will pick up the higher priority. Does it actually help? Well, frankly I don't think I noticed much difference on my machines. :^)

Edit: Be sure you use a text editor, like Notepad, when you edit the sample file.
ID: 1469371 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : How do I run GPU high priority and other questions.


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.