AK V8 + CUDA MB team work mod


log in

Advanced search

Message boards : Number crunching : AK V8 + CUDA MB team work mod

1 · 2 · 3 · 4 . . . 12 · Next
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 857579 - Posted: 25 Jan 2009, 11:06:52 UTC
Last modified: 25 Jan 2009, 11:42:19 UTC

This version intended to allow simultaneous SETI MB processing on CPU and GPU on the same host.

It's just workaround of BOINC's inability to maintain such config. So, as any workaround it has its own advantages, disadvantages and limitations. Peruse known issues and use this package wisely. Please, report all unknows issues in this thread.

Lets begin:
1) This first "proof of concept" version will work only on SSSE3-capable hosts (sorry AMD fans, if this will work SSE3 will be supported too of course). Only Intel Core and up for now.
2) This package can work ONLY on SETI main. Even don't try to use it on SETI beta - you can lose tasks for nothing.
3) This CPU-GPU team will not play nicely with other GPU-related projects like GPU-grid. It's because no BOINC GPU-management mechanism is used in this version.
This fact has positive side too - you do not need GPU-aware BOINC at all. You just need host with CUDA-supported GPU. It should work even with BOINC 5.xx
4) <ncpus>NUMBER_OF_CORES+1</ncpus> is REQUIRED for productive work. If you let BOINC manage CPU cores number you will end up with one idle core, trust me ;)
5) This AK V8 build was not PGOed so it will show worse performance than current CPU-only AK V8 SSSE3x app (will be fixed if this approach will be useful)
6) Probably will not use second GPU on dual-GPU hosts.

How it works:
for BOINC it looks as usual CPU opt app installed. BOINC will call CPU app (AK_v8b_win_SSSE3x_GPU_CPU_team.exe in our case) ans assign one of SETI MB tasks for it. But this app aware of possibility to use GPU for computations. It will check if another instance (it knows only itself and its clones, so - no other GPU-related projects please) already use GPU and if not - will start GPU-related app (MB_6.08_mod_CPU_team_CUDA.exe in our case) and suspend itself until GPU app finish. This CUDA app will do all work as usually but will do it on GPU leave CPU almost free.
That's why you should increase number of cores. BOINC should run NUMBER_OF_CORES+1 app thinking they all are CPU-related (some cheating of poor old BOINC here ;) )

After installation try to keep eye on first few results - this is pretty new approach and I can't give any guaranties if it will work for your config. If something will go wrong, please, revert to old variant you used before and describe your issue in this thread.

P.S. Now you can easily see how fast CUDA is indeed (on non-VLAR tasks). CPU apps completed <20% of their tasks when CUDA app finished its first task on my Q9450+9600GSO host ;)
Enjoy!

http://lunatics.kwsn.net/gpu-crunching/ak-v8-cuda-mb-team-work-mod.msg13268.html#msg13268

Those who have no access to Lunatics site can download package from this link:
http://files.mail.ru/LFJSNC

Profile Toppie
Send message
Joined: 3 Apr 99
Posts: 31
Credit: 48,710,104
RAC: 78
South Africa
Message 857691 - Posted: 25 Jan 2009, 16:52:40 UTC - in response to Message 857579.

[quote]This version intended to allow simultaneous SETI MB processing on CPU and GPU on the same host.

It's just workaround of BOINC's inability to maintain such config. So, as any workaround it has its own advantages, disadvantages and limitations. Peruse known issues and use this package wisely. Please, report all unknows issues in this thread.

P.S. Now you can easily see how fast CUDA is indeed (on non-VLAR tasks). CPU apps completed <20% of their tasks when CUDA app finished its first task on my Q9450+9600GSO host ;)
Enjoy!


Those who have no access to Lunatics site can download package from this link:
http://files.mail.ru/LFJSNC[/quote

Hi,
Presumably this is 32bit only? Any 64bit version available?

Toppie.


____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 857724 - Posted: 25 Jan 2009, 18:52:06 UTC - in response to Message 857691.

It will work on x64 (at least CPU part) too.
x64-x86 performance difference not so big to think about it on this stage of development.
Redo PGO optimization for this build will give better speed increase.

Profile Toppie
Send message
Joined: 3 Apr 99
Posts: 31
Credit: 48,710,104
RAC: 78
South Africa
Message 857793 - Posted: 25 Jan 2009, 21:37:53 UTC - in response to Message 857724.

It will work on x64 (at least CPU part) too.
x64-x86 performance difference not so big to think about it on this stage of development.
Redo PGO optimization for this build will give better speed increase.


Thanx!

____________

The Naja
Avatar
Send message
Joined: 20 Apr 08
Posts: 18
Credit: 1,940,239
RAC: 0
Switzerland
Message 858039 - Posted: 26 Jan 2009, 10:29:06 UTC - in response to Message 857724.
Last modified: 26 Jan 2009, 10:37:44 UTC

Thanks, will give a try !

- stopping receiving new work just now
- backing up current folder
- waiting WU in queue to be processed (12h from now approx.)

Test will be performed on http://setiathome.berkeley.edu/show_host_detail.php?hostid=4317460 (Boinc 6.4.5)

If I understood correctly, this fullfils the conditions for your version...

Will report how it behaves...

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858119 - Posted: 26 Jan 2009, 16:11:05 UTC - in response to Message 858039.

V8a update available.
Changes:

- VLAR autokill mod enabled for CUDA app
- PGO redone for CPU app
- Wall-clock elapsed time since last restart (or since start of task) is added to stderr for both apps.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,828,815
RAC: 11,926
United States
Message 858123 - Posted: 26 Jan 2009, 16:27:41 UTC - in response to Message 858119.
Last modified: 26 Jan 2009, 16:53:40 UTC

So this will run ssse3 and uses the r103 mod? If I have this straight, if I run out of APs it will run MB on the CPU Right? Or does it run the MBs in line with the APs? I mean it just takes either an AP or an MB whichever is FIFO?

edit: One more question. Have you given any thought to maybe flagging the VLARs to be done by the CPU rather than killing them? I hate shoving them off on someone else to do them but they really mess up my graphics when I run them. They seem to slow everything down.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858133 - Posted: 26 Jan 2009, 17:13:40 UTC - in response to Message 858123.

1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU.

2) When I figure out how avoid to do VLARs on GPU w/o aborting them I will add that ability.

For now I see such way: CPU app pre-parse task, take AR and doesn't call CUDA app if AR=VLAR.
But this will OK only for first 4 VLARs in a row (for quad and even less for duo or single core). 5th VLAR will cause CPU app swap on one of cores (on all cores actually cause there is no affinity setted now). And GPU will be just idle. And because VLARs as any other AR ranges come in pretty big groups, surely more than 4-5 in a row, the net result of such "improvement" will be idle GPU.

Another modification - take VLARs to CPU up to NUMBER_OF_CORES running CPU apps then go to GPU. But because VLARs come in bunch this will be in current build after few completed tasks anyway.

The only "true" solution is to mark VLAR task as "suspended" in BOINC and resume it when CPU core will be available. But this requires interaction with BOINC manager... If someone would provide relevant sources (for example, BOINCview sources) where example of task suspending/resuming contained, well, maybe I could implement such interaction here too.

This solution is temporary both from BOINC side (BOINC should be able to run CPU and GPU apps together by itself) and from VLAR CUDA side (CUDA app in development - I hope it will be able to do VLARs much faster than it do now) so I see no big sense to put too much efforts in this build and recive diminishing results. Now performance of this "team" is at possible maximum. Only SSE3 and maybe x64 variants for CPU app is worth to add.

JPP
Send message
Joined: 31 May 99
Posts: 15
Credit: 16,016,410
RAC: 12,511
France
Message 858140 - Posted: 26 Jan 2009, 17:30:31 UTC - in response to Message 858133.

hi
i m having a dual proc with a ge8K gpu
so i tried using cuda since a while and honestly found a significant change in performance; mid last week , after loading a new cuda image
but until yesterday ...
because , yesterday; suddenly ; i noticed only 1 active wu was being worked ; the one buddy with the gpu : but nothing is being worked by the second core anymore
tried many things; still the same
any idea ?
cheers
jeanpierre@jpp

____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,828,815
RAC: 11,926
United States
Message 858147 - Posted: 26 Jan 2009, 17:36:37 UTC - in response to Message 858133.

Thanks Raistmer that was just what I wanted to know. I think I will give it a try.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858156 - Posted: 26 Jan 2009, 17:50:06 UTC - in response to Message 858140.

hi
i m having a dual proc with a ge8K gpu
so i tried using cuda since a while and honestly found a significant change in performance; mid last week , after loading a new cuda image
but until yesterday ...
because , yesterday; suddenly ; i noticed only 1 active wu was being worked ; the one buddy with the gpu : but nothing is being worked by the second core anymore
tried many things; still the same
any idea ?
cheers
jeanpierre@jpp


Read first message http://setiathome.berkeley.edu/forum_thread.php?id=50829
You need increase ncpus field.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,828,815
RAC: 11,926
United States
Message 858157 - Posted: 26 Jan 2009, 17:51:18 UTC

Hmmmm, interesting. I loaded it up and as soon as I restarted BM it paused the two AP units I had been running and went to two MBs to run in high priority. Nothing wrong with this just thought it was interesting.

If anything else happens I will let you know.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858162 - Posted: 26 Jan 2009, 17:56:01 UTC - in response to Message 858157.
Last modified: 26 Jan 2009, 17:56:51 UTC

Hmmmm, interesting. I loaded it up and as soon as I restarted BM it paused the two AP units I had been running and went to two MBs to run in high priority. Nothing wrong with this just thought it was interesting.

If anything else happens I will let you know.


Check with task manager if it really uses all available cores and if it runs both CUDA and CPU app for MB.
I need to redo dyn-data collection for SSE3 PGO so it takes some time to post SSE3 build..

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,828,815
RAC: 11,926
United States
Message 858166 - Posted: 26 Jan 2009, 18:10:53 UTC - in response to Message 858162.
Last modified: 26 Jan 2009, 18:12:41 UTC

My fault, I forgot to mention it was doing a cuda too. :) Oh, and also I exited BM again and when I restarted it went back to doing the two APs and a cuda. :)
____________


PROUD MEMBER OF Team Starfire World BOINC

JPP
Send message
Joined: 31 May 99
Posts: 15
Credit: 16,016,410
RAC: 12,511
France
Message 858175 - Posted: 26 Jan 2009, 18:24:39 UTC - in response to Message 858166.

well for me ; task manager show like boincmgr
1 cpu doing noting at all
summary 5 or 8% load on 1 core (cuda ); the remaining 95 % are free
so second core is not used at all ; which again boincmgr confirms only 1 active task
how should i troubleshoot that (where to start ?)
txs
jeanpierr€@jpp
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858259 - Posted: 26 Jan 2009, 21:27:54 UTC - in response to Message 858175.
Last modified: 26 Jan 2009, 21:37:12 UTC

well for me ; task manager show like boincmgr
1 cpu doing noting at all
summary 5 or 8% load on 1 core (cuda ); the remaining 95 % are free
so second core is not used at all ; which again boincmgr confirms only 1 active task
how should i troubleshoot that (where to start ?)
txs
jeanpierr€@jpp


Hm... I answered already, please, peruse thread, not just glance on it .
And you will save your time and time of peoples who try to help you.
Again:

Read first message http://setiathome.berkeley.edu/forum_thread.php?id=50829
You need increase ncpus field.

6) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU.
Here is example of minimal cc_config.xml file you need
<cc_config>
<options>
<ncpus>enter_number_of_cores+1_value_here</ncpus>
</options>
</cc_config>

You should put it in BOINCdata folder.

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 142
Norway
Message 858535 - Posted: 27 Jan 2009, 9:37:57 UTC - in response to Message 858133.

1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU.


Hi,

I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...?

Morten

____________
Morten Ross

Profile MarkJ
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 936
Credit: 19,427,386
RAC: 7,435
Australia
Message 858542 - Posted: 27 Jan 2009, 10:18:32 UTC - in response to Message 858535.

1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU.


Hi,

I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...?

Morten


Thats what happens when you tell BOINC 6.4.5 its got 1 more cpu than it really has.

You might want to try BOINC 6.5.0 without a cc_config but be warned that it doesn't shutdown apps on exit. You can shut them down first (advanced -> shutdown connected client -> click okay) and then exit.
____________
BOINC blog

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858543 - Posted: 27 Jan 2009, 10:25:11 UTC - in response to Message 858535.

1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU.


Hi,

I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...?

Morten


Sure it not optimal. But could you suggest something better - that's the question. Optimal is not to allow MB queue drain, but it's not always possible of course. But running 5 CPU apps on 4 cores more effective than to run 3 CPU app on 3 cores most time.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,024,068
RAC: 37,209
Russia
Message 858546 - Posted: 27 Jan 2009, 10:26:42 UTC - in response to Message 858542.
Last modified: 27 Jan 2009, 10:27:05 UTC

1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU.


Hi,

I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...?

Morten


Thats what happens when you tell BOINC 6.4.5 its got 1 more cpu than it really has.

You might want to try BOINC 6.5.0 without a cc_config but be warned that it doesn't shutdown apps on exit. You can shut them down first (advanced -> shutdown connected client -> click okay) and then exit.

Will not work at all.
V8x package doesn't use BOINC's GPU management at all. So you end up with idle CPU core, look first message again.

1 · 2 · 3 · 4 . . . 12 · Next

Message boards : Number crunching : AK V8 + CUDA MB team work mod

Copyright © 2014 University of California