AK V8 + CUDA MB team work mod

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 857579 - Posted: 25 Jan 2009, 11:06:52 UTC Last modified: 25 Jan 2009, 11:42:19 UTC This version intended to allow simultaneous SETI MB processing on CPU and GPU on the same host. It's just workaround of BOINC's inability to maintain such config. So, as any workaround it has its own advantages, disadvantages and limitations. Peruse known issues and use this package wisely. Please, report all unknows issues in this thread. Lets begin: 1) This first "proof of concept" version will work only on SSSE3-capable hosts (sorry AMD fans, if this will work SSE3 will be supported too of course). Only Intel Core and up for now. 2) This package can work ONLY on SETI main. Even don't try to use it on SETI beta - you can lose tasks for nothing. 3) This CPU-GPU team will not play nicely with other GPU-related projects like GPU-grid. It's because no BOINC GPU-management mechanism is used in this version. This fact has positive side too - you do not need GPU-aware BOINC at all. You just need host with CUDA-supported GPU. It should work even with BOINC 5.xx 4) <ncpus>NUMBER_OF_CORES+1</ncpus> is REQUIRED for productive work. If you let BOINC manage CPU cores number you will end up with one idle core, trust me ;) 5) This AK V8 build was not PGOed so it will show worse performance than current CPU-only AK V8 SSSE3x app (will be fixed if this approach will be useful) 6) Probably will not use second GPU on dual-GPU hosts. How it works: for BOINC it looks as usual CPU opt app installed. BOINC will call CPU app (AK_v8b_win_SSSE3x_GPU_CPU_team.exe in our case) ans assign one of SETI MB tasks for it. But this app aware of possibility to use GPU for computations. It will check if another instance (it knows only itself and its clones, so - no other GPU-related projects please) already use GPU and if not - will start GPU-related app (MB_6.08_mod_CPU_team_CUDA.exe in our case) and suspend itself until GPU app finish. This CUDA app will do all work as usually but will do it on GPU leave CPU almost free. That's why you should increase number of cores. BOINC should run NUMBER_OF_CORES+1 app thinking they all are CPU-related (some cheating of poor old BOINC here ;) ) After installation try to keep eye on first few results - this is pretty new approach and I can't give any guaranties if it will work for your config. If something will go wrong, please, revert to old variant you used before and describe your issue in this thread. P.S. Now you can easily see how fast CUDA is indeed (on non-VLAR tasks). CPU apps completed <20% of their tasks when CUDA app finished its first task on my Q9450+9600GSO host ;) Enjoy! http://lunatics.kwsn.net/gpu-crunching/ak-v8-cuda-mb-team-work-mod.msg13268.html#msg13268 Those who have no access to Lunatics site can download package from this link: http://files.mail.ru/LFJSNC ID: 857579 ·

Toppie Send message Joined: 3 Apr 99 Posts: 31 Credit: 50,287,619 RAC: 0	Message 857691 - Posted: 25 Jan 2009, 16:52:40 UTC - in response to Message 857579. [quote]This version intended to allow simultaneous SETI MB processing on CPU and GPU on the same host. It's just workaround of BOINC's inability to maintain such config. So, as any workaround it has its own advantages, disadvantages and limitations. Peruse known issues and use this package wisely. Please, report all unknows issues in this thread. P.S. Now you can easily see how fast CUDA is indeed (on non-VLAR tasks). CPU apps completed <20% of their tasks when CUDA app finished its first task on my Q9450+9600GSO host ;) Enjoy! Those who have no access to Lunatics site can download package from this link: http://files.mail.ru/LFJSNC[/quote Hi, Presumably this is 32bit only? Any 64bit version available? Toppie. ID: 857691 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 857724 - Posted: 25 Jan 2009, 18:52:06 UTC - in response to Message 857691. It will work on x64 (at least CPU part) too. x64-x86 performance difference not so big to think about it on this stage of development. Redo PGO optimization for this build will give better speed increase. ID: 857724 ·

Toppie Send message Joined: 3 Apr 99 Posts: 31 Credit: 50,287,619 RAC: 0	Message 857793 - Posted: 25 Jan 2009, 21:37:53 UTC - in response to Message 857724. It will work on x64 (at least CPU part) too. x64-x86 performance difference not so big to think about it on this stage of development. Redo PGO optimization for this build will give better speed increase. Thanx! ID: 857793 ·

The Naja Send message Joined: 20 Apr 08 Posts: 18 Credit: 1,940,239 RAC: 0	Message 858039 - Posted: 26 Jan 2009, 10:29:06 UTC - in response to Message 857724. Last modified: 26 Jan 2009, 10:37:44 UTC Thanks, will give a try ! - stopping receiving new work just now - backing up current folder - waiting WU in queue to be processed (12h from now approx.) Test will be performed on http://setiathome.berkeley.edu/show_host_detail.php?hostid=4317460 (Boinc 6.4.5) If I understood correctly, this fullfils the conditions for your version... Will report how it behaves... ID: 858039 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858119 - Posted: 26 Jan 2009, 16:11:05 UTC - in response to Message 858039. V8a update available. Changes: - VLAR autokill mod enabled for CUDA app - PGO redone for CPU app - Wall-clock elapsed time since last restart (or since start of task) is added to stderr for both apps. ID: 858119 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 858123 - Posted: 26 Jan 2009, 16:27:41 UTC - in response to Message 858119. Last modified: 26 Jan 2009, 16:53:40 UTC So this will run ssse3 and uses the r103 mod? If I have this straight, if I run out of APs it will run MB on the CPU Right? Or does it run the MBs in line with the APs? I mean it just takes either an AP or an MB whichever is FIFO? edit: One more question. Have you given any thought to maybe flagging the VLARs to be done by the CPU rather than killing them? I hate shoving them off on someone else to do them but they really mess up my graphics when I run them. They seem to slow everything down. PROUD MEMBER OF Team Starfire World BOINC ID: 858123 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858133 - Posted: 26 Jan 2009, 17:13:40 UTC - in response to Message 858123. 1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU. 2) When I figure out how avoid to do VLARs on GPU w/o aborting them I will add that ability. For now I see such way: CPU app pre-parse task, take AR and doesn't call CUDA app if AR=VLAR. But this will OK only for first 4 VLARs in a row (for quad and even less for duo or single core). 5th VLAR will cause CPU app swap on one of cores (on all cores actually cause there is no affinity setted now). And GPU will be just idle. And because VLARs as any other AR ranges come in pretty big groups, surely more than 4-5 in a row, the net result of such "improvement" will be idle GPU. Another modification - take VLARs to CPU up to NUMBER_OF_CORES running CPU apps then go to GPU. But because VLARs come in bunch this will be in current build after few completed tasks anyway. The only "true" solution is to mark VLAR task as "suspended" in BOINC and resume it when CPU core will be available. But this requires interaction with BOINC manager... If someone would provide relevant sources (for example, BOINCview sources) where example of task suspending/resuming contained, well, maybe I could implement such interaction here too. This solution is temporary both from BOINC side (BOINC should be able to run CPU and GPU apps together by itself) and from VLAR CUDA side (CUDA app in development - I hope it will be able to do VLARs much faster than it do now) so I see no big sense to put too much efforts in this build and recive diminishing results. Now performance of this "team" is at possible maximum. Only SSE3 and maybe x64 variants for CPU app is worth to add. ID: 858133 ·

JPP Send message Joined: 31 May 99 Posts: 18 Credit: 59,436,360 RAC: 47	Message 858140 - Posted: 26 Jan 2009, 17:30:31 UTC - in response to Message 858133. hi i m having a dual proc with a ge8K gpu so i tried using cuda since a while and honestly found a significant change in performance; mid last week , after loading a new cuda image but until yesterday ... because , yesterday; suddenly ; i noticed only 1 active wu was being worked ; the one buddy with the gpu : but nothing is being worked by the second core anymore tried many things; still the same any idea ? cheers jeanpierre@jpp ID: 858140 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 858147 - Posted: 26 Jan 2009, 17:36:37 UTC - in response to Message 858133. Thanks Raistmer that was just what I wanted to know. I think I will give it a try. PROUD MEMBER OF Team Starfire World BOINC ID: 858147 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858156 - Posted: 26 Jan 2009, 17:50:06 UTC - in response to Message 858140. hi i m having a dual proc with a ge8K gpu so i tried using cuda since a while and honestly found a significant change in performance; mid last week , after loading a new cuda image but until yesterday ... because , yesterday; suddenly ; i noticed only 1 active wu was being worked ; the one buddy with the gpu : but nothing is being worked by the second core anymore tried many things; still the same any idea ? cheers jeanpierre@jpp Read first message http://setiathome.berkeley.edu/forum_thread.php?id=50829 You need increase ncpus field. ID: 858156 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 858157 - Posted: 26 Jan 2009, 17:51:18 UTC Hmmmm, interesting. I loaded it up and as soon as I restarted BM it paused the two AP units I had been running and went to two MBs to run in high priority. Nothing wrong with this just thought it was interesting. If anything else happens I will let you know. PROUD MEMBER OF Team Starfire World BOINC ID: 858157 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858162 - Posted: 26 Jan 2009, 17:56:01 UTC - in response to Message 858157. Last modified: 26 Jan 2009, 17:56:51 UTC Hmmmm, interesting. I loaded it up and as soon as I restarted BM it paused the two AP units I had been running and went to two MBs to run in high priority. Nothing wrong with this just thought it was interesting. If anything else happens I will let you know. Check with task manager if it really uses all available cores and if it runs both CUDA and CPU app for MB. I need to redo dyn-data collection for SSE3 PGO so it takes some time to post SSE3 build.. ID: 858162 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 858166 - Posted: 26 Jan 2009, 18:10:53 UTC - in response to Message 858162. Last modified: 26 Jan 2009, 18:12:41 UTC My fault, I forgot to mention it was doing a cuda too. :) Oh, and also I exited BM again and when I restarted it went back to doing the two APs and a cuda. :) PROUD MEMBER OF Team Starfire World BOINC ID: 858166 ·

JPP Send message Joined: 31 May 99 Posts: 18 Credit: 59,436,360 RAC: 47	Message 858175 - Posted: 26 Jan 2009, 18:24:39 UTC - in response to Message 858166. well for me ; task manager show like boincmgr 1 cpu doing noting at all summary 5 or 8% load on 1 core (cuda ); the remaining 95 % are free so second core is not used at all ; which again boincmgr confirms only 1 active task how should i troubleshoot that (where to start ?) txs jeanpierrÃ¢â€šÂ¬@jpp ID: 858175 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858259 - Posted: 26 Jan 2009, 21:27:54 UTC - in response to Message 858175. Last modified: 26 Jan 2009, 21:37:12 UTC well for me ; task manager show like boincmgr 1 cpu doing noting at all summary 5 or 8% load on 1 core (cuda ); the remaining 95 % are free so second core is not used at all ; which again boincmgr confirms only 1 active task how should i troubleshoot that (where to start ?) txs jeanpierrÃ¢â€šÂ¬@jpp Hm... I answered already, please, peruse thread, not just glance on it . And you will save your time and time of peoples who try to help you. Again: Read first message http://setiathome.berkeley.edu/forum_thread.php?id=50829 You need increase ncpus field. 6) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU. Here is example of minimal cc_config.xml file you need <cc_config> <options> <ncpus>enter_number_of_cores+1_value_here</ncpus> </options> </cc_config> You should put it in BOINCdata folder. ID: 858259 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 858535 - Posted: 27 Jan 2009, 9:37:57 UTC - in response to Message 858133. 1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU. Hi, I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...? Morten Morten Ross ID: 858535 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 858542 - Posted: 27 Jan 2009, 10:18:32 UTC - in response to Message 858535. 1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU. Hi, I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...? Morten Thats what happens when you tell BOINC 6.4.5 its got 1 more cpu than it really has. You might want to try BOINC 6.5.0 without a cc_config but be warned that it doesn't shutdown apps on exit. You can shut them down first (advanced -> shutdown connected client -> click okay) and then exit. BOINC blog ID: 858542 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858543 - Posted: 27 Jan 2009, 10:25:11 UTC - in response to Message 858535. 1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU. Hi, I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...? Morten Sure it not optimal. But could you suggest something better - that's the question. Optimal is not to allow MB queue drain, but it's not always possible of course. But running 5 CPU apps on 4 cores more effective than to run 3 CPU app on 3 cores most time. ID: 858543 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 858546 - Posted: 27 Jan 2009, 10:26:42 UTC - in response to Message 858542. Last modified: 27 Jan 2009, 10:27:05 UTC 1) This "team" will do AP tasks, MB tasks and will utilize both CPU and GPU for MB tasks. AP tasks still only for CPU. Hi, I have now run V8 until it emptied my MB cache (not receiving MBs), but when the MB WU-cache is empty the GPU is switched for a CPU and 5 CPUs are used for AP - not 4. I know I can manually change ncpus, but that will have to be changed back when I receive MB WUs - not really optimal - or intentional...? Morten Thats what happens when you tell BOINC 6.4.5 its got 1 more cpu than it really has. You might want to try BOINC 6.5.0 without a cc_config but be warned that it doesn't shutdown apps on exit. You can shut them down first (advanced -> shutdown connected client -> click okay) and then exit. Will not work at all. V8x package doesn't use BOINC's GPU management at all. So you end up with idle CPU core, look first message again. ID: 858546 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.