'eFMer Priority' - (CUDA) app priority change

Message boards : Number crunching : 'eFMer Priority' - (CUDA) app priority change
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 955284 - Posted: 16 Dec 2009, 22:26:45 UTC
Last modified: 16 Dec 2009, 22:40:39 UTC


Because of the other thread..
[Message 955214]


Thanks Fred for to jump in..

And thank you for to still reply my lot of PMs..
(I asked him lot of times, because of CUDA priority change topics. )


Sure, the priority change of the CUDA WUs/app from stock 'lower than normal' to 'normal' will help to shortening the GPU calculation time.
If you work at the PC/surf the web, or progs like boinc.exe (BOINC client) have activity/CPU time usage, only CPU WUs are disturbed.
(So bigger your WU cache size, so more boinc.exe CPU time usage.)

Higher than 'normal' wouldn't be advisable.


So members, go to the DL area and test the nice prog of Fred.
Install TThrottle V1.73 and then the Special Edition.
[http://www.efmer.eu/boinc/tthrottle_manual.html]
[http://www.efmer.eu/boinc/download.html]

If enough DLs, we'll get a stand alone CUDA priority change prog.


You're welcome to share your experiences here.




ID: 955284 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 956037 - Posted: 16 Dec 2009, 23:30:48 UTC

"CUDA priority change prog" could be just any app that can change process priority. I would recommend ProcessLasso for example, no need to write special app for that.
BTW, compute-intensive process with normal priority can noticeable degrade user experience, especially on single-core system.
So, I would recommend such priority change with care (for advanced users at least, who will not blame project in whole if app with changed priority will disturb their HD video ;) )and primarily for dedicated cruncher hosts.
ID: 956037 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 956803 - Posted: 16 Dec 2009, 23:44:15 UTC - in response to Message 956037.  
Last modified: 16 Dec 2009, 23:45:35 UTC


Yes, sure.. everybody should make a test on his own system if it will run well.
If you look movies it wouldn't be recommended.
Then you can exit/finish/close 'CUDApriority' in this time.

I made a small Google search.. Process Lasso decrease the priority of tasks if the CPU is overloaded. Yes, No? Then it's not this 'we' would like to have..

It would be easier, if you would make a 3rd opt._CUDA_app.. with/without VLARkill and a 3rd with CUDA priority at 'normal'.





ID: 956803 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 956808 - Posted: 16 Dec 2009, 23:57:07 UTC - in response to Message 956803.  

[...]
It would be easier, if you would make a 3rd opt._CUDA_app.. with/without VLARkill and a 3rd with CUDA priority at 'normal'.
[...]


OTOH.. 'CUDApriority' would be the better way, people which use the PC also for look movies could profit from it also..




ID: 956808 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 957384 - Posted: 19 Dec 2009, 12:03:43 UTC - in response to Message 956808.  

Here is a dedicated version of the priority changer: http://www.efmer.eu/boinc/download.html Read the warning first.
It doesn't need the installation of TThrottle.
ID: 957384 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 957439 - Posted: 19 Dec 2009, 21:01:13 UTC
Last modified: 19 Dec 2009, 21:22:13 UTC


I changed the title of this thread from..
'Do you use TThrottle incl. CUDA priority change?'
to..
''eFMer Priority' - (CUDA) app priority change'


--------------------------------------------------------------------------


Fred, thanks a lot for this nice prog!

I guess a lot of people will use it for to increase the CUDA_app priority to 'normal' for to increase the performance of the whole PC (CPU + GPU).


--------------------------------------------------------------------------


@ all


[eFMer Priority]_DL site


You DLed this nice prog and you like it?
You would like to post your experiences?





ID: 957439 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 957516 - Posted: 20 Dec 2009, 10:58:49 UTC - in response to Message 956803.  


I made a small Google search.. Process Lasso decrease the priority of tasks if the CPU is overloaded. Yes, No? Then it's not this 'we' would like to have..

It just we want to have cause it can change priority (in both directions).
If we talk about freeing CUDA app from disturbance then normal priority isn't enough. Other processes running with normal priority will disturb CUDA app just because they will eat their CPU quote before releasing CPU to next process with the same priority.
That is, if you want to free CUDA app you should run it on elevated priority like "above normal" or "high".
Will post such build on Lunatics soon.
But such considerations really matters only for single core hosts.

For multi-core, multy-GPU hosts, that is, hosts with best performance, not only priority matters. Actually, more important is CPU affinity issue.
It seems windows can't reshedule higher-priority tasks to another core just to preempt lower-priority task.
What I mean:
When running 2 hybrid AP of prev versions on my quad (elevated priority regards CPU MB app) I've seen next picture: both AP tasks resided on the same core competing for CPU (and leaving GPU idle) while other 3 cores (!) just get 100% load from idle priority CPU SETI MB.
Now each hybrid AP process will get it's own core. Inside this core priority scheduling works well, AP process will take precedence over SETI MB and GPU will be as busy as possible in current version.
Perhaps, same approach should be implemented for CUDA MB too (affinity lock on different cores if many GPU installed in host).

ID: 957516 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 957518 - Posted: 20 Dec 2009, 11:38:18 UTC - in response to Message 957516.  


I made a small Google search.. Process Lasso decrease the priority of tasks if the CPU is overloaded. Yes, No? Then it's not this 'we' would like to have..

It just we want to have cause it can change priority (in both directions).
If we talk about freeing CUDA app from disturbance then normal priority isn't enough. Other processes running with normal priority will disturb CUDA app just because they will eat their CPU quote before releasing CPU to next process with the same priority.
That is, if you want to free CUDA app you should run it on elevated priority like "above normal" or "high".
Will post such build on Lunatics soon.
But such considerations really matters only for single core hosts.

For multi-core, multy-GPU hosts, that is, hosts with best performance, not only priority matters. Actually, more important is CPU affinity issue.
It seems windows can't reshedule higher-priority tasks to another core just to preempt lower-priority task.
What I mean:
When running 2 hybrid AP of prev versions on my quad (elevated priority regards CPU MB app) I've seen next picture: both AP tasks resided on the same core competing for CPU (and leaving GPU idle) while other 3 cores (!) just get 100% load from idle priority CPU SETI MB.
Now each hybrid AP process will get it's own core. Inside this core priority scheduling works well, AP process will take precedence over SETI MB and GPU will be as busy as possible in current version.
Perhaps, same approach should be implemented for CUDA MB too (affinity lock on different cores if many GPU installed in host).


But affinity control will be highly complicated to implement.

There is the GPU application that can best run on a separate cpu.
But then you should also make sure other applications that use up a lot of cpu time are not on that assigned cpu.
And above all BOINC itself should be allowed to run as fast as possible. BOINC is running at normal priority and it needs it.
If the BOINC client slows down the GPU task will do the same, they are closely connected.

On a dedicated computer you could get this to work...
But there are so many factors to consider, nr of cores how fast, nr of GPU's and how fast.
And of the different OS will behave quite differently.

ID: 957518 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 957520 - Posted: 20 Dec 2009, 12:10:09 UTC - in response to Message 957518.  

But affinity control will be highly complicated to implement.

There is the GPU application that can best run on a separate cpu.
But then you should also make sure other applications that use up a lot of cpu time are not on that assigned cpu.
And above all BOINC itself should be allowed to run as fast as possible. BOINC is running at normal priority and it needs it.
If the BOINC client slows down the GPU task will do the same, they are closely connected.

On a dedicated computer you could get this to work...
But there are so many factors to consider, nr of cores how fast, nr of GPU's and how fast.
And of the different OS will behave quite differently.



1) I spoke only about WindowsNT kernel. Don't know do linux kernels have such issue or not.
2) This already implemented in hybrid AP, not too complicated actually.

Why it's not too complex to not implement:
Consideration based on assumption that installed GPU is much faster than single CPU core (in other case all these priority and affinity effects have just minor meaning).
If host can run many instances of GPU app 2 things should be made:
1) assign each new GPU app instance to new core (while there are enough free cores of course, else it will assign tasks to cores in roundrobin manner)
2) increase GPU app priority over priority of other CPU-intensive processes running on this host. In most cases those other processes are BOINC's CPU apps. They have idle priority so below normal default priority of GPU app is just enough for this.
But Sutaru's experiments showed that further increase in priority for GPU apps will have positive effect on host performance. That is, boinc.exe/boincmgr.exe itself (I assume it's almost dedicated cruncher (again, if it's not any priority change can affect user experience and should be carried with great care)) affects on GPU app performace. That's why above normal priority needed in this case. Affinity lock will not add any new to this, GPU app will compete with boinc just as it will compete with another instance of itself if affinity lock is missing.
The rule that BOINC's priority should be higher than all it's apps priority seems unneeded. No big deal IMO if BOINC missed 2-3 of its every few seconds sheduler simulations. All that needed - at the end of one GPU task BOINC should run another GPU task w/o delay. That's all. If it can do it, there will be no performance loss.
ID: 957520 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 957524 - Posted: 20 Dec 2009, 12:49:45 UTC - in response to Message 957520.  

But affinity control will be highly complicated to implement.

There is the GPU application that can best run on a separate cpu.
But then you should also make sure other applications that use up a lot of cpu time are not on that assigned cpu.
And above all BOINC itself should be allowed to run as fast as possible. BOINC is running at normal priority and it needs it.
If the BOINC client slows down the GPU task will do the same, they are closely connected.

On a dedicated computer you could get this to work...
But there are so many factors to consider, nr of cores how fast, nr of GPU's and how fast.
And of the different OS will behave quite differently.



1) I spoke only about WindowsNT kernel. Don't know do linux kernels have such issue or not.
2) This already implemented in hybrid AP, not too complicated actually.

Why it's not too complex to not implement:
Consideration based on assumption that installed GPU is much faster than single CPU core (in other case all these priority and affinity effects have just minor meaning).
If host can run many instances of GPU app 2 things should be made:
1) assign each new GPU app instance to new core (while there are enough free cores of course, else it will assign tasks to cores in roundrobin manner)
2) increase GPU app priority over priority of other CPU-intensive processes running on this host. In most cases those other processes are BOINC's CPU apps. They have idle priority so below normal default priority of GPU app is just enough for this.
But Sutaru's experiments showed that further increase in priority for GPU apps will have positive effect on host performance. That is, boinc.exe/boincmgr.exe itself (I assume it's almost dedicated cruncher (again, if it's not any priority change can affect user experience and should be carried with great care)) affects on GPU app performace. That's why above normal priority needed in this case. Affinity lock will not add any new to this, GPU app will compete with boinc just as it will compete with another instance of itself if affinity lock is missing.
The rule that BOINC's priority should be higher than all it's apps priority seems unneeded. No big deal IMO if BOINC missed 2-3 of its every few seconds sheduler simulations. All that needed - at the end of one GPU task BOINC should run another GPU task w/o delay. That's all. If it can do it, there will be no performance loss.

Ok you've been thinking about this somewhat longer than I have.
A problem, BOINC can stop and start cuda tasks (waiting to run), messing up the affinity system.
And I've seen the BOINC client go up to 100% core. If you did assign the same core to a GPU task ..... it may starve.
And you further assume that the CPU doesn't switch other tasks from one cpu to another. Even to the core that is used as a feeder.
But the best way is always try.
ID: 957524 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 957527 - Posted: 20 Dec 2009, 13:07:09 UTC - in response to Message 957524.  

Ok you've been thinking about this somewhat longer than I have.
A problem, BOINC can stop and start cuda tasks (waiting to run), messing up the affinity system.

For some projects stopping GPU task = restarting GPU task from zero progress.
For SETI it = redo initial CPU based initialization.
Both this cases bring so big performance loss that any additional affinity issues will be order of magnitude lower IMO.
Pausing GPU tasks should be avoided by any means for BOINC IMO.

And I've seen the BOINC client go up to 100% core. If you did assign the same core to a GPU task ..... it may starve.

Agreed. That's why in this case CUDA MB priority should be > than BOINC's, not just equal it !


And you further assume that the CPU doesn't switch other tasks from one cpu to another. Even to the core that is used as a feeder.

No, not quite. Windows definitely can do thread switching between cores. If core is idle for example. But in fully loaded (by SETI ;) ) host there will be no idle cores. There will be cores doing high priority GPU processes and cores that doing low priority CPU processes. In this case Windows can pair 2 high priority processes on single core while other cores do only low-priority processes. Priority sheduling doesn't work between cores it seems, only inside core. If some low-priority thread will be resheduled on another core with high-priority thread - no problems, high-priority GPU thread will preemt it in usual way. So, affinity lock + elevated priority for GPU thread will handle such situation.


But the best way is always try.

Agreed :) Corresponding build in progress.

ID: 957527 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 957554 - Posted: 20 Dec 2009, 15:49:44 UTC - in response to Message 957520.  
Last modified: 20 Dec 2009, 16:05:50 UTC

[...]
2) increase GPU app priority over priority of other CPU-intensive processes running on this host. In most cases those other processes are BOINC's CPU apps. They have idle priority so below normal default priority of GPU app is just enough for this.
But Sutaru's experiments showed that further increase in priority for GPU apps will have positive effect on host performance. That is, boinc.exe/boincmgr.exe itself (I assume it's almost dedicated cruncher (again, if it's not any priority change can affect user experience and should be carried with great care)) affects on GPU app performace. That's why above normal priority needed in this case. Affinity lock will not add any new to this, GPU app will compete with boinc just as it will compete with another instance of itself if affinity lock is missing.
The rule that BOINC's priority should be higher than all it's apps priority seems unneeded. No big deal IMO if BOINC missed 2-3 of its every few seconds sheduler simulations. All that needed - at the end of one GPU task BOINC should run another GPU task w/o delay. That's all. If it can do it, there will be no performance loss.


I made a test on my GPU cruncher (4x OCed GTX260-216, AMD Quad @ 3.0 GHz), 4x MB (CPU) and 4x CUDA (GPU) and the GPU calculation time increased x2 or x3. Not all, but a lot of tasks.
And the CPU couldn't compensate the loss. The whole PC RAC decreased.
So because of this I don't crunch on the CPU of my GPU cruncher.

Now with 'Priority' maybe I'll make this test again.


On my QX6700 @ 3.14 GHz with OCed GTX260-216, I see full performance of the CPU and GPU (with 'Priority' or in past with TThrottle (incl. priority change)). I'm little surprised (positive) that this PC have a RAC of ~ 19,300 . (4x MB + 1x CUDA)
The CPU before had a RAC of 4,500 - 4,800 (now it would be maybe more because of the current mix of AR WUs). The added GPU have ~ 14,650 RAC.
The GPU RAC is well compared with my GPU cruncher, 4x GPU = 57,800 /4 = 14,450 RAC/GPU. The lonely GPU (in the QX6700 PC) is higher OCed.



So from what I read, the same priority of different tasks won't give same CPU time, at the same time.

For example, boinc.exe ('normal' priority) have activity then CUDA_app (changed to 'normal' priority) would like to have also CPU time, the CUDA_app need to wait until boinc.exe will be finished?

I worry if the CUDA_app have higher priority than boinc.exe .
For example on my GPU cruncher if I have 4 new CUDA starts -> 4x 25 % CPU time usage, the whole CPU is full loaded. Then boinc.exe need to UL/DL or request/report WUs/results, no free CPU time is available -> PC crash?




ID: 957554 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 957577 - Posted: 20 Dec 2009, 18:02:58 UTC - in response to Message 957554.  
Last modified: 20 Dec 2009, 18:03:46 UTC


For example, boinc.exe ('normal' priority) have activity then CUDA_app (changed to 'normal' priority) would like to have also CPU time, the CUDA_app need to wait until boinc.exe will be finished?

Until it relese CPU or its current CPU timeslice is over (~20ms on Windows).


I worry if the CUDA_app have higher priority than boinc.exe .
For example on my GPU cruncher if I have 4 new CUDA starts -> 4x 25 % CPU time usage, the whole CPU is full loaded. Then boinc.exe need to UL/DL or request/report WUs/results, no free CPU time is available -> PC crash?


Highly unlikely system crash. But some delay in starting new tasks could be.

So, your one host has 4 GPU and second - only 1 GPU?
Well, currently I did build that manage 2 GPUs per host It seems you need another one.
For quad it gives 2 cores per GPU in affinity lock (it should not degrade performance because of some disk activity and idle core - thread could migrate between 2 cores). Locking each thread to single core could decrease performance.

Could you see in task manager how usually CPU load for CUDA MB (on initialization stage) distributed?
Each core get full load or ?
BTW, your prev negative experience with CPU + GPU load can be linked to affinity problem. If windows allocates signle core for 2 or more GPU processes - performance penalty occurs. Not running anything but GPU MB will leaving some cores idle then windows relocates thread and no affinity lock needed. But you leaved CPU idle fraction of time in this case - sub-optimal variant.
ID: 957577 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 957618 - Posted: 20 Dec 2009, 21:14:41 UTC - in response to Message 957577.  
Last modified: 20 Dec 2009, 21:28:09 UTC


You would recommend to set with 'Priority' the CUDA_app priority to 'Above'?
With one and four GPUs?

Yes, the GPU cruncher have 4, the other PC 1 GPU..
You think I need to buy a 2nd for the QX6700 PC?
But, AFAIK the mobo (Intel D975XBX2) switch back the PCIe 1.0 speed down to x8 if PCIe 1 and PCIe 2 slot have a GPU is insert.
The PCIe 3 slot have a 6200 LE.
So current with x16 max. GPU performance.
Also I would need to buy a new PSU, the current 520 W PSU have enough work.. ~ 385 W (~ 405 W peak) out of the wall plug.
So no 'update' this year, unless Santa Claus make a stop at my home..

AFAIK, IIRC, with opt._MB_6.08_CUDA_V12_app, if a new CUDA WU start, the 25 % CPU usage is distributed over all CPU-Cores (AMD Quad).
Not only one CPU-Core 100 %.

The 1st build of your CUDA_app was fixed IIRC at the first CPU-Core.
So not well for multi GPU cruncher. A new CUDA WU start, the other GPUs idled.

IIRC, my GPU cruncher with .._CUDA_V12_.. all CPU-Cores have small activity in pure GPU calculation time.
Not really idle CPU-Core. AFAIK, every CUDA_app get one CPU-Core.

I see - this priority, affinity or what ever additional..
Maybe every PC system need a special CUDA_app build?
Maybe it's better/possible to make a prog in which I can set my equipment (for example: QUAD-CPU + 2x GPUs, DUO-CPU + 2x GPUs, QUAD-CPU + 4x GPU) and the prog choose the best performance settings for the CUDA_app?




ID: 957618 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 957667 - Posted: 20 Dec 2009, 23:22:16 UTC

I posted 2 new builds on Lunatics in this thread:
http://lunatics.kwsn.net/12-gpu-crunching/cuda-mb-v12b-for-multi-gpu-multicore-hosts.msg23666.html;topicseen#msg23666

Try them and see if there will be some performance boost or not.
ID: 957667 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 958712 - Posted: 25 Dec 2009, 16:14:41 UTC - in response to Message 957667.  
Last modified: 25 Dec 2009, 16:33:46 UTC


O.K., I made a test of the CUDA_V12b_x4, compared to ..V12 .
(AMD QUAD & 4x OCed GTX260-216 (4x MB & 4x CUDA))
It look not so well.

Also I let run eFMer Priority (to 'high'), because the .._V12b increase only to 'normal'. Why?

The CUDA WU preparation time on the CPU (CPU time in TaskManager) increased from 12 to 13 sec.
It look also like that the CUDA app don't have the full 25 % CPU continuously.
It vary (more than before) from ~ 18 - 25 %. So the real wall clock time of the CUDA WU start on CPU increase more.

[I saw/see this with all builds.
If I let run the CPU idle, the vary is also there. It look like, it vary less if CPU WUs are on CPU simultaneously.
But all the time (only GPU also CPU & GPU) all builds don't have continuously 25 % CPU.
It look like with eFMer Priority ('high') it vary not so much.]

Also the whole GPU calculation time increased.

Only a few examples, I made only a ~ 1 hour test, because I saw the performance loss.

.._V12b:
AR 0.398837
Run time 676.609375
CPU time 47.15625
Wall-clock time elapsed since last restart: 673.4 seconds
[http://setiathome.berkeley.edu/result.php?resultid=1457691730]

AR 0.398837
Run time 690.671875
CPU time 44.82813
Wall-clock time elapsed since last restart: 674.1 seconds
[http://setiathome.berkeley.edu/result.php?resultid=1457691703]

..V12:
AR 0.398808
Run time 624.015625
CPU time 43.95313
Wall-clock time elapsed since last restart: 620.5 seconds
[http://setiathome.berkeley.edu/result.php?resultid=1457689203]



Additional..
If I let run .._V12 (& eFMer Priority 'high') I see also some speed vary.

AR 0.398211
Run time 651.109375
CPU time 44.34375
Wall-clock time elapsed since last restart: 647.1 seconds
[http://setiathome.berkeley.edu/result.php?resultid=1457512202]

AR 0.398211
Run time 628.5625
CPU time 45.42188
Wall-clock time elapsed since last restart: 625.7 seconds
[http://setiathome.berkeley.edu/result.php?resultid=1457512198]

Sure, I don't see with help of eFMer Priority the x2 or x3 GPU calculation time like in past.
But I guess some CUDA WUs are suspended, if an other new CUDA WU have the preparation on CPU (2 CUDA apps at one CPU-Core) ??




ID: 958712 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 958747 - Posted: 25 Dec 2009, 23:12:07 UTC - in response to Message 958712.  

Sure, I don't see with help of eFMer Priority the x2 or x3 GPU calculation time like in past.

Should it mean that w/o that tool you see such slowdown with V12b ??
Could you do clean test of build itself, w/o any additional tools?

And another question - does slowdown you see still lead to bigger RAC drop than RAC increase due to CPU processing too. If there is no more x2 slowdown then you could enable CPU processing too and get considerable performance boost. Will this boost offset performance decrease due to slightly slower GPU processing on your host?

ID: 958747 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 958750 - Posted: 25 Dec 2009, 23:18:52 UTC - in response to Message 958712.  

[color=blue]
because the .._V12b increase only to 'normal'. Why?


Because priority of worker thread increased to NORMAL +2, not just to normal.
TaskManager shows only process priority (so called priority class) while scheduling performed based on thread priorities.
If you want to see thread priorities you should use tools like ProcessExplorer from SysInternals.
ID: 958750 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 958862 - Posted: 26 Dec 2009, 21:21:14 UTC - in response to Message 958750.  
Last modified: 26 Dec 2009, 21:27:00 UTC

Should it mean that w/o that tool you see such slowdown with V12b ??
Could you do clean test of build itself, w/o any additional tools?

And another question - does slowdown you see still lead to bigger RAC drop than RAC increase due to CPU processing too. If there is no more x2 slowdown then you could enable CPU processing too and get considerable performance boost. Will this boost offset performance decrease due to slightly slower GPU processing on your host?

I didn't made the V12b_x4 test without eFMer Priority.
I saw some testers at Lunatics, you have enough..? Or, I should do this kind of test without eFMer Priority?

I made a calculation for some time.. if I would calculate CPU & GPU simultaneously..
IIRC, if all CUDA WUs have + ~ 10 sec. calculation time and this are normal AR WUs I would have + ~ 2,500 RAC.
((GPUs - ~ 1,000 RAC, CPU + ~ 3,500 RAC) The CPU alone would have maybe ~ 4,300 RAC. Because of big boinc.exe and GPU support less.)
If this would be all shorties, I would have +/- same RAC. The CPU couldn't compensate the loss.


Because priority of worker thread increased to NORMAL +2, not just to normal.
TaskManager shows only process priority (so called priority class) while scheduling performed based on thread priorities.
If you want to see thread priorities you should use tools like ProcessExplorer from SysInternals.

Ohh.. worker thread.
eFMer Priority change the process priority?

I looked in the web and I wouldn't intelligent.

What would be the highest settings for the CUDA app?
Both priorities to 'high'?

Hmm.. you are the coder..
I would maybe recommend to set the highest available priorities, that the CUDA app have always the highest CPU support.
This is possible, or it's not so easy?


Because.. V12 & eFMer Priority change only the process priority (to 'high') and I don't see big GPU performance loss.
Vary of ~ 20 sec. between same AR WUs.

Maybe only this kind of priority change and CPU affinity lock?




ID: 958862 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 958875 - Posted: 26 Dec 2009, 22:04:07 UTC - in response to Message 958862.  

I saw some testers at Lunatics, you have enough..? Or, I should do this kind of test without eFMer Priority?

needed tests still not done so it would be nice if you carried them.


I made a calculation for some time.. if I would calculate CPU & GPU simultaneously..
IIRC, if all CUDA WUs have + ~ 10 sec. calculation time and this are normal AR WUs I would have + ~ 2,500 RAC.
((GPUs - ~ 1,000 RAC, CPU + ~ 3,500 RAC) The CPU alone would have maybe ~ 4,300 RAC. Because of big boinc.exe and GPU support less.)
If this would be all shorties, I would have +/- same RAC. The CPU couldn't compensate the loss.

Not quite understood your calculation. You wrote:
-1000 + 3500 =+2500, that is, CPU + little slower GPU will get +2500RAC boost.
But later you wrote "The CPU couldn't compensate the loss." Could you explain this bit more?


eFMer Priority change the process priority?
Fred could answer, not me.


What would be the highest settings for the CUDA app?
Both priorities to 'high'?

Realtime for process and time_critical for thread. And then your system will almost certainly crash.


This is possible, or it's not so easy?

If one thread has priority of 10 and another - priority of 15 (for example), windows will not allocate more CPU time to priority 15 thread than it would do for priority of 11,12,13,14 thread (provided, there are no other such threads in active state). That is, once thread priority big enough to be bigger than priority of other CPU-consuming threads, further priority increase will not help.
But if your thread has priority bigger than some system-critical thread, than you can end with BSoD or just OS freeze (and lost much more computation time on reboot than save blocking almost all processes in OS). If you want to get as much CPU as possible to CUDA app, try to disable as much services as you can and don't run any crapware like Adobe update and so on.
Setting process priority to "high" can be not allowed under Vista with UAC enabled. Some special security settings could be needed for this.
Does Fred's priority app trigger UAC prompt under Vista/Win7 ?

ID: 958875 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : 'eFMer Priority' - (CUDA) app priority change


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.