Modified SETI MB CUDA + opt AP package for full GPU utilization

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 855830 - Posted: 20 Jan 2009, 23:30:33 UTC - in response to Message 855822. Last modified: 20 Jan 2009, 23:31:35 UTC Raistmer, When this new 6.6.2 comes out are you going to have to do a new package for it or can we use the V6 still? Is new BOINC version anticipated to be incompatible with all previous builds? I think not. So I see no reasons why not to use the same science apps with new BOINC version. What GPU-related changes are planned in new BOINC ? Better scheduler ? ID: 855830 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 855833 - Posted: 20 Jan 2009, 23:40:49 UTC - in response to Message 855830. What GPU-related changes are planned in new BOINC ? Better scheduler ? It's got separate work-fetch modules for CPU and GPU as well as a new client-side CPU scheduler. Amongst other things, at least. Also a very nasty bug where when you have a CUDA card and set the project to no new tasks, it'll send you work anyway. ;-) ID: 855833 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856224 - Posted: 21 Jan 2009, 23:18:45 UTC - in response to Message 855833. Last modified: 21 Jan 2009, 23:55:33 UTC There is new package version available: http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg12177.html#msg12177 http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg13120.html#msg13120 or (if you can't download from Lunatics) http://files.mail.ru/K2AFN0 It contains the same CUDA app (equivalent to 6.08 stock ) and latest SSE3 opt AP release (r103) that has increased performance. ID: 856224 ·

ghost17 Send message Joined: 15 Jun 04 Posts: 2 Credit: 3,590 RAC: 0	Message 856362 - Posted: 22 Jan 2009, 10:29:48 UTC At first thanks for the package. I tried to run SETI CUDA on my laptop, it ran GPURID wihtout any problems. First thing gone bad was this validation error yesterday which resulted in a loop of client errors, which btw should be fixed in my eyes. 90 client errors in 2 hours should stop BOINC from downloading anything more, but okay, that isn't a point to discuss here. Today after my quota was reenabled I installed this package and everything worked fine for... 19 seconds. After that it aborts with a compute error. Graphics card is a nVidia 9500m GS with 512 MB and newest notebook drivers from nVidia itself. BTW: The standard CUDA application is working BUT I can't use my computer then because it slows down everything, starting from moving windows up to showing anything on the screen. That's a little bit crappy :-/ ID: 856362 ·

Byron S Goodgame Volunteer tester Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0	Message 856369 - Posted: 22 Jan 2009, 10:47:30 UTC Have the new modified + AP app installed. I had an AP already running and it made the transition just fine as I'm sure was expected. Haven't had a chance to run the modified CUDA app yet, that will be a few hours before I finish the tasks for Beta that I'm running. What is to be expected as far as a performance increase from the new AP app? ID: 856369 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856379 - Posted: 22 Jan 2009, 11:33:36 UTC - in response to Message 856369. Last modified: 22 Jan 2009, 11:35:52 UTC What is to be expected as far as a performance increase from the new AP app? Well, as Jason already wrote in AP-dedicated thread there is few changes in new AP release - they can be classified in 3 cathegories: 1) Attempts of BOINC's instant termination bug/feature workaround (0 state file on termination during checkpoint). Unfortunately, we can only try to reduce probability of this bug manifestation. As long as BOINC will kill science app's process w/o mercy this bug still possible. 2) Overall performance improvements - these improvements will speedup any AP task processing. 3) More effective handling of tasks wih overflows. Now some unneeded computations for these tasks will be skipped. This mod can give very nice performance boost for some of tasks but value of speed increase will be data-dependent and different for different tasks. ID: 856379 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856381 - Posted: 22 Jan 2009, 11:42:31 UTC - in response to Message 856362. At first thanks for the package. I tried to run SETI CUDA on my laptop, it ran GPURID wihtout any problems. First thing gone bad was this validation error yesterday which resulted in a loop of client errors, which btw should be fixed in my eyes. 90 client errors in 2 hours should stop BOINC from downloading anything more, but okay, that isn't a point to discuss here. Today after my quota was reenabled I installed this package and everything worked fine for... 19 seconds. After that it aborts with a compute error. Graphics card is a nVidia 9500m GS with 512 MB and newest notebook drivers from nVidia itself. BTW: The standard CUDA application is working BUT I can't use my computer then because it slows down everything, starting from moving windows up to showing anything on the screen. That's a little bit crappy :-/ I looked on one of your results http://setiathome.berkeley.edu/result.php?resultid=1129701614 This error can be attributed to driver incompatibility. Unfortunately, my build works only with driver version 180.48 and higher (don't know why - it's just experimental fact). If you can't install 18x.xx driver on your notebook you could try to use some another (maybe beta) driver. If it will not help - well, you can either reject to use CUDA app on this host or modify app_info.xml to use stock 6.08 along with opt AP r103. Stock app has no driver limit it seems. ID: 856381 ·

ghost17 Send message Joined: 15 Jun 04 Posts: 2 Credit: 3,590 RAC: 0	Message 856384 - Posted: 22 Jan 2009, 11:49:35 UTC - in response to Message 856381. OK, I'll try that later. I'm running on 179.28 notebook drivers. ID: 856384 ·

Dawn Raison Send message Joined: 6 Jun 99 Posts: 3 Credit: 4,317,068 RAC: 0	Message 856387 - Posted: 22 Jan 2009, 11:56:01 UTC So just what do I need to do to get a cache of CUDA workunits loaded? I've had two batches sent so far - each time about 15 or so and they crunched fine and went on to become the canocial result in a few cases. However unless I go into BOINC and fiddle (e.g. suspend tasks) I can't get any CUDA WUs sent - instead I keep getting loads of Astropulse WUs. Surely I can have 3/4 Astropulses and Seti CUDA at the same time? Rgds. DavidR. Boinc 6.4.5, Astropulse v5.00, Seti CUDA 6.0.8 Card: 9800GTX+ oc Driver: v181.20, PC: XP x64, 8GiB ECC RAM, 790FX, Phenom II X4 940 Also running ClimatePrediction, Spinhenge ID: 856387 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856389 - Posted: 22 Jan 2009, 12:23:14 UTC - in response to Message 856387. 1) try to use opt package, described in this thread (you listed stock apps - not subject of this threrad) 2) try to increase SETI project share 3) try to increase cache size. ID: 856389 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 856397 - Posted: 22 Jan 2009, 12:48:45 UTC - in response to Message 856389. Last modified: 22 Jan 2009, 13:08:54 UTC Hi, I have now been monitoring for some time and still do not see any change in the following behaviour: Cuda crunching works like a charm for maybe one or two hours, then the WU simply does not progress, and the Cuda app consumes a full CPU. Closing BOINC and ensuring all executables are no longer in process list (sometimes I must kill boinc.exe, as processes are not released), then restarting BOINC sometimes resolve and WU is completed OK. Other times more than one restart of BOINC is needed before the WU is processed. The next is one of two - either the next WU is processed normally or it is processed for a few seconds, then completed with success(!). Only option is to restart computer to start the cycle once again. OS is Windows Server 2008 x64, BOINC 6.6.0 x64. Video drivers tested: 185.20 which I downgraded to 181.22 as part of ap_5.00r103_SSE3 and MB_6.08_mod_VLAR_kill_CUDA upgrade. To no avail it seems. Below are the WUs during the abovementioned scenario: 1129308489 1129308493 1129308763 Perhaps this is an x64-issue? The following hosts are 32-bit and are seemingly working very well: 2650128 4242105 Morten Ross Morten Ross ID: 856397 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856578 - Posted: 22 Jan 2009, 22:55:40 UTC - in response to Message 856397. Hi, I have now been monitoring for some time and still do not see any change in the following behaviour: Cuda crunching works like a charm for maybe one or two hours, then the WU simply does not progress, and the Cuda app consumes a full CPU. Closing BOINC and ensuring all executables are no longer in process list (sometimes I must kill boinc.exe, as processes are not released), then restarting BOINC sometimes resolve and WU is completed OK. Other times more than one restart of BOINC is needed before the WU is processed. The next is one of two - either the next WU is processed normally or it is processed for a few seconds, then completed with success(!). Only option is to restart computer to start the cycle once again. OS is Windows Server 2008 x64, BOINC 6.6.0 x64. Video drivers tested: 185.20 which I downgraded to 181.22 as part of ap_5.00r103_SSE3 and MB_6.08_mod_VLAR_kill_CUDA upgrade. To no avail it seems. Below are the WUs during the abovementioned scenario: 1129308489 1129308493 1129308763 Perhaps this is an x64-issue? The following hosts are 32-bit and are seemingly working very well: 2650128 4242105 Morten Ross Well, most probably it's x64 issue indeed. I have second CUDA-anabled host with dual GPU cards (8500GT + 9400GT). OS: Wndows Server 2003 x64. It manifessted the same behavior. There was memory leak (my build clerly demonstrated that cause it dumps available GPU memory at beginning of task). 1-2 tasks for 8500 and a little more for 9400 could be completed (but they ended with computational error usually) and after that app fall back to CPU processing due to low GPU memory condition. Only OS restart helped. But when I upgraded ti 181.20 x64 drivers this problem disappeared. ow it seems this host can crunch OK on both GPUs. Link to host on beta: http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=18439 ID: 856578 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 856579 - Posted: 22 Jan 2009, 22:55:56 UTC - in response to Message 856397. ... Additional note: boincmgr.exe is also allways using 100% CPU when the abovementioned scenario occurs, so it seems the interaction between cuda and boincmgr is a problem. Morten Morten Ross ID: 856579 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856580 - Posted: 22 Jan 2009, 22:58:05 UTC - in response to Message 856579. ... Additional note: boincmgr.exe is also allways using 100% CPU when the abovementioned scenario occurs, so it seems the interaction between cuda and boincmgr is a problem. Morten I didn't try BOINC 6.6.0 Use 6.4.5. Maybe you could downgrade to it too to rule our BOINC from this "equation" ? ID: 856580 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 856582 - Posted: 22 Jan 2009, 23:05:02 UTC - in response to Message 856580. I will try a downgrade and see what happens, allthough that might introduce other isses fixed in 6.6.0. There is also the difference in OS versions - you have Windows 2003 and I've got Windows 2008. Morten Ross Morten Ross ID: 856582 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 856745 - Posted: 23 Jan 2009, 9:19:56 UTC - in response to Message 856582. Hi, I did a downgrade, but unfortunately that did not make any difference - WU will still hang after some time of crunching. There is one change - boincmgr.exe is not using 100% CPU when this happens, and all WUs are released properly when I exit BOINC manager. Morten Ross Morten Ross ID: 856745 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856762 - Posted: 23 Jan 2009, 11:23:22 UTC - in response to Message 856745. Well, some positive progress at last :) What will be if you will use 181.20 driver now ? ID: 856762 ·

Morten Ross Volunteer tester Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0	Message 856826 - Posted: 23 Jan 2009, 16:06:49 UTC - in response to Message 856762. Last modified: 23 Jan 2009, 16:13:17 UTC Hi, I have downgraded from 181.22 to 181.20 (from file version 7.15.11.8122 to 7.15.11.8120) - but no change, boincmgr must be restarted for the hanging WU to be processed..... The WU hangs are allways occurring at the very beginning of the processing - after about 17 seconds (CPU time - not elapsed time) of processing, and between 0,000% and 0,500% progress. Morten Morten Ross ID: 856826 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 856877 - Posted: 23 Jan 2009, 18:10:15 UTC - in response to Message 856826. Well, it seems your host is good candidate for using Maik's script. It handles such hang situation AFAIK. Look for it here or on Lunatics site. ID: 856877 ·

BMaytum Volunteer tester Send message Joined: 3 Apr 99 Posts: 104 Credit: 4,382,041 RAC: 2	Message 856884 - Posted: 23 Jan 2009, 18:27:23 UTC - in response to Message 856762. Last modified: 23 Jan 2009, 19:07:03 UTC Well, some positive progress at last :) What will be if you will use 181.20 driver now ? @ Raistmer: Well your V7 optimized "MB_6.08-mod_kill_VLAR_CUDA.exe" client is working well for me, using WinXP Pro SP3 32bit with nVidia 181.20 drivers on my single 8800GT/512mb (G92 GPU) video card under BOINC 6.6.0 on S@H main (non-Beta) project. My Tasks are listed here: Tasks for User ID 7782289 Aside: As I reported here Boinc 6.6.2 just released, my two attempts to use Boinc 6.6.2 with your V7 Cuda client gave me Boinc Manager problems, but that's for others to work out (I guess?). I haven't tried 6.6.2 with stock Berkeley 6.08-Cuda client. I was really hoping 6.6.2 would work because it reportedly should allow concurrent GPU MB crunching and CPU MB/AP crunching. Have you tried Boinc 6.6.2 yet? I am wondering if the present VLAR Kill feature in your V7 application is actually necessary, because when I earlier was running Boinc 6.4.5 with stock Berkeley 6.08-Cuda and nVidia 180.60 drivers, that stock 6.08-Cuda would successfully crunch MB WUs that had very low angle range (VLAR). (Aside: Like so many others reported, VLAR WUs crunched with stock Berkeley 6.06-Cuda and 6.07-Cuda would produce Compute Errors). Another question: When your V7 application detects that a WU angle range is below your VLAR limit, V7 terminates that WU with an exit code -6 so that WU gets listed as a Client Error/Compute Error. For eaxample see: http://setiathome.berkeley.edu/result.php?resultid=1130457910 Could you instead terminate VLAR WUs with an error code that reports it as "Client Aborted"? That would help differentiate such WUs from WUs that terminate with error code -9 overflow (e.g. #spikes=30, or #spikes=15+#peaks=15). Just a thought.... Sabertooth Z77, i7-3770K@4.2GHz, GTX680, W8.1Pro x64 P5N32-E SLI, C2D E8400@3Ghz, GTX580, Win7SP1Pro x64 & PCLinuxOS2015 x64 ID: 856884 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.