V10 of modified SETI MB CUDA + opt AP package for full multi-GPU+CPU use

Message boards : Number crunching : V10 of modified SETI MB CUDA + opt AP package for full multi-GPU+CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 14 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885253 - Posted: 14 Apr 2009, 6:06:18 UTC - in response to Message 885142.  
Last modified: 14 Apr 2009, 6:28:54 UTC

...
About CUDA performance - if the core where CUDA MB executed is occupied with some higher priority process (with priority higher than CUDA MB worker thread one) the app can experience substantional performance degradation. That's why CPU affinity mod was created (to give user with multicore CPUs ability to bound other processes on higher core numbers while leaving first core mostly for CUDA MB.) Other BOINC apps executed on even more low priority and should not interfere with CUDA MB.


Hmm..

It's only a crunching rig.. so nothing other on it..

So.. I guess.. it can be only the 25 % CPU / 100 % Core peaks from boinc.exe..
This could/would explain the CUDA performance lost?
This peaks are maybe 1 time/min. and last ~ 3 - 5 sec.

For example if I would crunch only on the GPUs this wouldn't help to have all the time 100 % GPU performance, or?

If I remember correct, in past as I crunched only on the GPUs - boinc.exe isn't also fixed only at a special CPU-Core.
Because in TaskManager all 4 CPU-Core showed usage-peaks as boinc.exe took 25 % CPU.

So finally I think my boinc.exe peaks couldn't reduce my CUDA performance, or?


What could be the cause?
It's all disabled (in BIOS and Windows) what I don't need.. only pure crunching rig.

As BOINC itself run on higher priority than worker threads - yes, IMHO it's possible reason. I checked boinc.exe current CPU time on my host - it already took 11 min of CPU time and continue with some peaks of 9% CPU usage (BOINC 6.6.20). 11 min of CPU time looks as some GPU tasks could be affected by these delays.

ADDON: it can be tested by using locked affinity CUDA MB mod and restricting BOINC affinity to all cores but first one. That way CUDA MB should be free from boinc.exe interference.
But in this case another problem arises for multi-GPU hosts. At moment of starting new task freshly launched instance of CUDA MB tasks will interfere with other executing app instances (newly statted tries to take whole core for itself but all CUDA MB instances locked on the same single core with affinity lock mod). This situation was discovered by Bob Mahoney and was the reason I removed affinity lock mod from recent CUDA MB builds.
Considering possible boinc.exe interferention with CUDA MB running times maybe it's worth to restore that mod for single GPU multicore CPU hosts and restrict boinc.exe from using first core.
ID: 885253 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885255 - Posted: 14 Apr 2009, 6:11:47 UTC - in response to Message 885251.  


@ elgar, Borgholio

Why it work for all the others and for me also? ;-)

If you like, follow my upper 'instructions' and with help of the app_info.xml-thread and with only Raistmer's CUDA V10 app and .dll's and give it a try.. :-)


..maybe it didn't worked because you took the complete mod?
I don't know..

If you made a mistake in your app_info.xml you wouldn't get CUDA tasks..

Complete pack with Number_of_GPUs and other needed modifications like cc_config works just OK for BOINC 6.6.20, I still use it in this way cause doing only MB right now on that host. But partially modified (with replace app_info only for example) can lead to errors of course. New app_info, "teamed" to ordinary AK_v8 replacement, removal of ncpus from cc_config are needed for "upgrading" to BOINC 6.6.20 CPU/GPU scheduling for SETI MB.

ID: 885255 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885256 - Posted: 14 Apr 2009, 6:22:57 UTC - in response to Message 885253.  

As BOINC itself run on higher priority than worker threads - yes, IMHO it's possible reason. I checked boinc.exe current CPU time on my host - it already took 11 min of CPU time and continue with some peaks of 9% CPU usage (BOINC 6.6.20). 11 min of CPU time looks as some GPU tasks could be affected by these dalays.


I'm only the 'amateur'.. ;-)

For my understanding I thought the operation system [Windows XP Home] (and BOINC) would be intelligent enough to adjust the lower app tasks in the order of the priorities.
That the CPU tasks would only 'disturbed' from the boinc.exe, because the CUDA tasks have higher priority.

It's possible to make a mod (or how I could tell it my OS?) that the CUDA tasks wouldn't/couldn't disturbed and have all the time 100 % CPU support in GPU crunching time? :-)

ID: 885256 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885258 - Posted: 14 Apr 2009, 6:32:52 UTC - in response to Message 885256.  
Last modified: 14 Apr 2009, 6:37:10 UTC

As BOINC itself run on higher priority than worker threads - yes, IMHO it's possible reason. I checked boinc.exe current CPU time on my host - it already took 11 min of CPU time and continue with some peaks of 9% CPU usage (BOINC 6.6.20). 11 min of CPU time looks as some GPU tasks could be affected by these dalays.


I'm only the 'amateur'.. ;-)

For my understanding I thought the operation system [Windows XP Home] (and BOINC) would be intelligent enough to adjust the lower app tasks in the order of the priorities.
That the CPU tasks would only 'disturbed' from the boinc.exe, because the CUDA tasks have higher priority.

It's possible to make a mod (or how I could tell it my OS?) that the CUDA tasks wouldn't/couldn't disturbed and have all the time 100 % CPU support in GPU crunching time? :-)

With RTOS yes, correct priority setting should ensure that some task will not be interrupted by less important activity... but Windows is no way RTOS.
Maybe port to QNX is required to fully utilize GPU potential... :)
ID: 885258 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885259 - Posted: 14 Apr 2009, 6:33:10 UTC - in response to Message 885256.  
Last modified: 14 Apr 2009, 6:41:19 UTC

As BOINC itself run on higher priority than worker threads - yes, IMHO it's possible reason. I checked boinc.exe current CPU time on my host - it already took 11 min of CPU time and continue with some peaks of 9% CPU usage (BOINC 6.6.20). 11 min of CPU time looks as some GPU tasks could be affected by these dalays.


I'm only the 'amateur'.. ;-)

For my understanding I thought the operation system [Windows XP Home] (and BOINC) would be intelligent enough to adjust the lower app tasks in the order of the priorities.
That the CPU tasks would only 'disturbed' from the boinc.exe, because the CUDA tasks have higher priority.

It's possible to make a mod (or how I could tell it my OS?) that the CUDA tasks wouldn't/couldn't disturbed and have all the time 100 % CPU support in GPU crunching time? :-)


..ops.. I forgot..

If I think about my upper posted results..
If boinc.exe disturb the CUDA tasks also, for my understanding all the CUDA tasks would be involved. So maybe all results have ~ 30 sec.* longer crunching time [vs. only GPU crunching] . And not only some very much longer crunching time..
Or not?


[* EDIT: from min. to sec. .. ;-)]

ID: 885259 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885260 - Posted: 14 Apr 2009, 6:37:19 UTC - in response to Message 885258.  

With RTOS yes, correct priority setting should ensure that some task will not be interrupted by less important activity... but Windows is no way RTOS.
Maybe port to QNX is required to fully utilize GPU potential... :)


What you talk about..? ;-)
I'm the 'amateur'.. :-D

QNX?

I don't know.. ;-)
If you think it's possible to do.. please do it.. :-)

If you think about and/or you had done this function.. please give a hint! :-D

ID: 885260 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885261 - Posted: 14 Apr 2009, 6:43:56 UTC - in response to Message 885259.  


If I think about my upper posted results..
If boinc.exe disturb the CUDA tasks also, for my understanding all the CUDA tasks would be involved. So maybe all results have ~ 30 min. longer crunching time [vs. only GPU crunching] . And not only some very much longer crunching time..
Or not?
[/color]

Maybe not. Actually, app transfer from one core to another can be very costly operation if these 2 cores dont' share the same L2 cache. I hope Windows is clever enough to realize this and not perform process transition too often. And that means boinc.exe will executed mostly on the one core and would interfere only with one GPU app for multi GPU host or time to time with single GPU app (only when next condition become true:
boinc.exe use excessive CPU _and_ boinc.exe and CUDA MB app are executed on the same core at this moment)
ID: 885261 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885264 - Posted: 14 Apr 2009, 6:49:09 UTC - in response to Message 885260.  
Last modified: 14 Apr 2009, 6:50:16 UTC


QNX?

I don't know.. ;-)
If you think it's possible to do.. please do it.. :-)

If you think about and/or you had done this function.. please give a hint! :-D
[/color]

It's not just a function it's complete RealTime OS. Free for non-commercial use BTW (look link at wikipedia about it in my earlier post).
So I'm afraid it's not an option for windows-bound users anyway.
Maybe some play with priorities _and_ affinity could increase overall performance for hosts with big queues and big boinc.exe CPU time usage.
IMHO BOINC scheduling and bookkeeping became too complex to be considered as negligible overhead...
ID: 885264 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885265 - Posted: 14 Apr 2009, 6:49:24 UTC - in response to Message 884989.  
Last modified: 14 Apr 2009, 6:51:03 UTC


http://setiathome.berkeley.edu/result.php?resultid=1195495666
icfft=86040, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1190297258
icfft=94365, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

[This error I posted already in this thread]
http://setiathome.berkeley.edu/result.php?resultid=1195160891
icfft=86665, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error


http://setiathome.berkeley.edu/result.php?resultid=1202303914
icfft=84509, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error


My 5th:
http://setiathome.berkeley.edu/result.php?resultid=1202338717
icfft=98384, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

---------------------------------------------------------------

If you would like to help Raistmer.. please post the '-12 Unknown error' also.. *thumb up*

It could look like this:
Exception detected inside cudaAcc_find_triplets, dumping client state
icfft=98384, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: ..\analyzePoT.cpp
Line: 348


And only the [bolded] line is needed.

ID: 885265 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885269 - Posted: 14 Apr 2009, 7:14:37 UTC - in response to Message 885261.  
Last modified: 14 Apr 2009, 7:18:46 UTC

Maybe not. Actually, app transfer from one core to another can be very costly operation if these 2 cores dont' share the same L2 cache. I hope Windows is clever enough to realize this and not perform process transition too often. And that means boinc.exe will executed mostly on the one core and would interfere only with one GPU app for multi GPU host or time to time with single GPU app (only when next condition become true:
boinc.exe use excessive CPU _and_ boinc.exe and CUDA MB app are executed on the same core at this moment)


Ohh.. I see - it's not so easy to be a 'optimizer/code cutter'.. ;-)
It's well, that I'm not you.. ;-D


..now I'm an AMD Phenom II user.. AFAIK, this CPU have not shared L2-Cache, every CPU-Core have his own L2-Cache.
[4 x L2-Cache]

So this performance lost would be higher.. as for Core2 Quads which have 2 x L2-Cache.
[2 CPU-Core share one L2-Cache]


Maybe it would be possible to fix boinc.exe on one CPU-Core?
[Or make it that boinc.exe share his 25 % CPU to all CPU-Core with 25 % Core-usage? [4 x 6,25 CPU usage]*]
But this would mean to make a BOINC-mod, or?
This would eliminate also the L2-Cache lost?
(Which version would be better? Only one CPU-Core for boinc.exe or 4 x same shared?)
[For example a Quad-CPU with 4 GPUs.. ;-)]

For rigs with much GPUs it would help to rise the performance..


[* If I remember correct my only GPU crunching time..
In TaskManager - if boinc.exe took 25 % CPU, all the CPU-Core had (different) higher usage-peaks..
So boinc.exe split his 25 % CPU to all 4 CPU-Core. But not 6,25 % CPU for each..]

ID: 885269 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885358 - Posted: 14 Apr 2009, 21:20:50 UTC - in response to Message 885269.  

Maybe it would be possible to fix boinc.exe on one CPU-Core?

Yes, by using task manager's affinity settings or by using some programs like ProcessLasso.

ID: 885358 · Report as offensive
piper69

Send message
Joined: 25 Sep 08
Posts: 49
Credit: 3,042,244
RAC: 0
Romania
Message 885360 - Posted: 14 Apr 2009, 21:27:59 UTC - in response to Message 885336.  

question to you guys.

does it impact overall performance of cuda app if i lower
this two values <avg_ncpus>0.127970</avg_ncpus>; <max_ncpus>0.127970</max_ncpus> to something like 0.03-0.04 ?

any answer/opinion is apreciated

thx
ID: 885360 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885361 - Posted: 14 Apr 2009, 21:29:33 UTC - in response to Message 885360.  

question to you guys.

does it impact overall performance of cuda app if i lower
this two values <avg_ncpus>0.127970</avg_ncpus>; <max_ncpus>0.127970</max_ncpus> to something like 0.03-0.04 ?

any answer/opinion is apreciated

thx

It's just hint to BOINC scheduler, not any realy CPU usage or CPU usage limit.
ID: 885361 · Report as offensive
piper69

Send message
Joined: 25 Sep 08
Posts: 49
Credit: 3,042,244
RAC: 0
Romania
Message 885362 - Posted: 14 Apr 2009, 21:34:31 UTC
Last modified: 14 Apr 2009, 21:37:20 UTC

problem is when i look in task manager the cuda app is constantly doing around 12% cpu usage.


is that normal?
ID: 885362 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885485 - Posted: 15 Apr 2009, 8:04:40 UTC - in response to Message 885362.  

problem is when i look in task manager the cuda app is constantly doing around 12% cpu usage.


is that normal?

It depends on CPU speed. I looked at your Athlon 64 X2 host results - it seems CUDA part working OK. You could see big CPU usage for CUDA app when it can't properly initialize (low GPU memory or another error) and falls back to CPU processing. This mode should be avoided cause CPU processing by CUDA MB much slower than by AK_v8. But your GPU works OK.
ID: 885485 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885506 - Posted: 15 Apr 2009, 9:01:12 UTC - in response to Message 884987.  
Last modified: 15 Apr 2009, 9:01:52 UTC

...
Since BOINC V6.6.20 your nice team mod isn't needed anymore.
Maybe you can open a new thread with the latest available CUDA versions? [V10/11]
[incl. dll's and needed app_info.xml]

Or maybe a hint to the app_info.xml thread?
app_info for AP500, AP503, MB603 and MB608


Which/where are the .dll-versions from you?
Please could you post where to find?
Or this are the .dll's from your V7 and V10 mod. This are the only different .dll-versions from you?
[in the 1st posts of the threads @ lunatics.kwsn.net?]

I could use the stock Berkeley .dll's also with your mod?
Or what's different?
I have no knowledge for what the .dll's are needed.. I'm not the profi.. I'm an 'amateur'.. ;-)


You didn't saw my questions? ;-)


If you don't have time with a new thread..
Could you please post the URLs of the recommend CUDA Vx apps and .dll's?

Then someone of the community could open a new thread because of this.. :-)

ID: 885506 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885517 - Posted: 15 Apr 2009, 10:02:00 UTC - in response to Message 885506.  

ID: 885517 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885627 - Posted: 15 Apr 2009, 19:16:32 UTC - in response to Message 885517.  

Could you please post the URLs of the recommend CUDA Vx apps and .dll's?

http://lunatics.kwsn.net/12-gpu-crunching/v10-of-modified-seti-mb-cuda-opt-ap-package-for-full-multi-gpucpu-use.msg16715.html#msg16715


Thanks a lot!


Hmm.. but.. what about the rigs with > 1 GPU in it..?

Soon also a new app release for this rigs?

ID: 885627 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 885831 - Posted: 16 Apr 2009, 16:44:59 UTC - in response to Message 885627.  

Could you please post the URLs of the recommend CUDA Vx apps and .dll's?

http://lunatics.kwsn.net/12-gpu-crunching/v10-of-modified-seti-mb-cuda-opt-ap-package-for-full-multi-gpucpu-use.msg16715.html#msg16715


Thanks a lot!


Hmm.. but.. what about the rigs with > 1 GPU in it..?

Soon also a new app release for this rigs?


..hmm.. I misread your post at lunatics.kwsn.net ?
I understand it, that this app is ONLY for rigs with 1 GPU.

No?

ID: 885831 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 885851 - Posted: 16 Apr 2009, 18:14:52 UTC - in response to Message 885831.  

Could you please post the URLs of the recommend CUDA Vx apps and .dll's?

http://lunatics.kwsn.net/12-gpu-crunching/v10-of-modified-seti-mb-cuda-opt-ap-package-for-full-multi-gpucpu-use.msg16715.html#msg16715


Thanks a lot!


Hmm.. but.. what about the rigs with > 1 GPU in it..?

Soon also a new app release for this rigs?


..hmm.. I misread your post at lunatics.kwsn.net ?
I understand it, that this app is ONLY for rigs with 1 GPU.

No?

You read it correct.
I updated that post and included builds w/o affinity lock based on current codebase. No actual performance difference should be with last V10/11 update buried in corresponding Lunatic's thread. I did fresh build just for convenience.

ID: 885851 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 14 · Next

Message boards : Number crunching : V10 of modified SETI MB CUDA + opt AP package for full multi-GPU+CPU use


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.