Modified SETI MB CUDA + opt AP package for full GPU utilization

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 855830 - Posted: 20 Jan 2009, 23:30:33 UTC - in response to Message 855822.  
Last modified: 20 Jan 2009, 23:31:35 UTC

Raistmer,
When this new 6.6.2 comes out are you going to have to do a new package for it or can we use the V6 still?

Is new BOINC version anticipated to be incompatible with all previous builds? I think not. So I see no reasons why not to use the same science apps with new BOINC version.
What GPU-related changes are planned in new BOINC ? Better scheduler ?
ID: 855830 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 855833 - Posted: 20 Jan 2009, 23:40:49 UTC - in response to Message 855830.  

What GPU-related changes are planned in new BOINC ? Better scheduler ?

It's got separate work-fetch modules for CPU and GPU as well as a new client-side CPU scheduler. Amongst other things, at least. Also a very nasty bug where when you have a CUDA card and set the project to no new tasks, it'll send you work anyway. ;-)
ID: 855833 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856224 - Posted: 21 Jan 2009, 23:18:45 UTC - in response to Message 855833.  
Last modified: 21 Jan 2009, 23:55:33 UTC

There is new package version available:
http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg12177.html#msg12177
http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg13120.html#msg13120
or (if you can't download from Lunatics)
http://files.mail.ru/K2AFN0

It contains the same CUDA app (equivalent to 6.08 stock ) and latest SSE3 opt AP release (r103) that has increased performance.
ID: 856224 · Report as offensive
ghost17

Send message
Joined: 15 Jun 04
Posts: 2
Credit: 3,590
RAC: 0
Germany
Message 856362 - Posted: 22 Jan 2009, 10:29:48 UTC

At first thanks for the package.

I tried to run SETI CUDA on my laptop, it ran GPURID wihtout any problems.

First thing gone bad was this validation error yesterday which resulted in a loop of client errors, which btw should be fixed in my eyes. 90 client errors in 2 hours should stop BOINC from downloading anything more, but okay, that isn't a point to discuss here.

Today after my quota was reenabled I installed this package and everything worked fine for... 19 seconds. After that it aborts with a compute error.

Graphics card is a nVidia 9500m GS with 512 MB and newest notebook drivers from nVidia itself.

BTW: The standard CUDA application is working BUT I can't use my computer then because it slows down everything, starting from moving windows up to showing anything on the screen. That's a little bit crappy :-/
ID: 856362 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 856369 - Posted: 22 Jan 2009, 10:47:30 UTC

Have the new modified + AP app installed. I had an AP already running and it made the transition just fine as I'm sure was expected. Haven't had a chance to run the modified CUDA app yet, that will be a few hours before I finish the tasks for Beta that I'm running.

What is to be expected as far as a performance increase from the new AP app?
ID: 856369 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856379 - Posted: 22 Jan 2009, 11:33:36 UTC - in response to Message 856369.  
Last modified: 22 Jan 2009, 11:35:52 UTC

What is to be expected as far as a performance increase from the new AP app?


Well, as Jason already wrote in AP-dedicated thread there is few changes in new AP release - they can be classified in 3 cathegories:
1) Attempts of BOINC's instant termination bug/feature workaround (0 state file on termination during checkpoint). Unfortunately, we can only try to reduce probability of this bug manifestation. As long as BOINC will kill science app's process w/o mercy this bug still possible.

2) Overall performance improvements - these improvements will speedup any AP task processing.

3) More effective handling of tasks wih overflows. Now some unneeded computations for these tasks will be skipped. This mod can give very nice performance boost for some of tasks but value of speed increase will be data-dependent and different for different tasks.
ID: 856379 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856381 - Posted: 22 Jan 2009, 11:42:31 UTC - in response to Message 856362.  

At first thanks for the package.

I tried to run SETI CUDA on my laptop, it ran GPURID wihtout any problems.

First thing gone bad was this validation error yesterday which resulted in a loop of client errors, which btw should be fixed in my eyes. 90 client errors in 2 hours should stop BOINC from downloading anything more, but okay, that isn't a point to discuss here.

Today after my quota was reenabled I installed this package and everything worked fine for... 19 seconds. After that it aborts with a compute error.

Graphics card is a nVidia 9500m GS with 512 MB and newest notebook drivers from nVidia itself.

BTW: The standard CUDA application is working BUT I can't use my computer then because it slows down everything, starting from moving windows up to showing anything on the screen. That's a little bit crappy :-/


I looked on one of your results
http://setiathome.berkeley.edu/result.php?resultid=1129701614
This error can be attributed to driver incompatibility.
Unfortunately, my build works only with driver version 180.48 and higher (don't know why - it's just experimental fact).
If you can't install 18x.xx driver on your notebook you could try to use some another (maybe beta) driver. If it will not help - well, you can either reject to use CUDA app on this host or modify app_info.xml to use stock 6.08 along with opt AP r103. Stock app has no driver limit it seems.
ID: 856381 · Report as offensive
ghost17

Send message
Joined: 15 Jun 04
Posts: 2
Credit: 3,590
RAC: 0
Germany
Message 856384 - Posted: 22 Jan 2009, 11:49:35 UTC - in response to Message 856381.  

OK, I'll try that later. I'm running on 179.28 notebook drivers.
ID: 856384 · Report as offensive
Dawn Raison

Send message
Joined: 6 Jun 99
Posts: 3
Credit: 4,317,068
RAC: 0
United Kingdom
Message 856387 - Posted: 22 Jan 2009, 11:56:01 UTC

So just what do I need to do to get a cache of CUDA workunits loaded?
I've had two batches sent so far - each time about 15 or so and they crunched fine and went on to become the canocial result in a few cases.
However unless I go into BOINC and fiddle (e.g. suspend tasks) I can't get any CUDA WUs sent - instead I keep getting loads of Astropulse WUs.
Surely I can have 3/4 Astropulses and Seti CUDA at the same time?

Rgds.
DavidR.

Boinc 6.4.5, Astropulse v5.00, Seti CUDA 6.0.8
Card: 9800GTX+ oc Driver: v181.20, PC: XP x64, 8GiB ECC RAM, 790FX, Phenom II X4 940
Also running ClimatePrediction, Spinhenge
ID: 856387 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856389 - Posted: 22 Jan 2009, 12:23:14 UTC - in response to Message 856387.  

1) try to use opt package, described in this thread (you listed stock apps - not subject of this threrad)
2) try to increase SETI project share
3) try to increase cache size.
ID: 856389 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 856397 - Posted: 22 Jan 2009, 12:48:45 UTC - in response to Message 856389.  
Last modified: 22 Jan 2009, 13:08:54 UTC

Hi,

I have now been monitoring for some time and still do not see any change in the following behaviour:

Cuda crunching works like a charm for maybe one or two hours, then the WU simply does not progress, and the Cuda app consumes a full CPU.
Closing BOINC and ensuring all executables are no longer in process list (sometimes I must kill boinc.exe, as processes are not released), then restarting BOINC sometimes resolve and WU is completed OK.
Other times more than one restart of BOINC is needed before the WU is processed.

The next is one of two - either the next WU is processed normally or it is processed for a few seconds, then completed with success(!).
Only option is to restart computer to start the cycle once again.

OS is Windows Server 2008 x64, BOINC 6.6.0 x64. Video drivers tested: 185.20 which I downgraded to 181.22 as part of ap_5.00r103_SSE3 and MB_6.08_mod_VLAR_kill_CUDA upgrade. To no avail it seems.

Below are the WUs during the abovementioned scenario:

1129308489
1129308493
1129308763

Perhaps this is an x64-issue? The following hosts are 32-bit and are seemingly working very well:

2650128
4242105

Morten Ross
Morten Ross
ID: 856397 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856578 - Posted: 22 Jan 2009, 22:55:40 UTC - in response to Message 856397.  

Hi,

I have now been monitoring for some time and still do not see any change in the following behaviour:

Cuda crunching works like a charm for maybe one or two hours, then the WU simply does not progress, and the Cuda app consumes a full CPU.
Closing BOINC and ensuring all executables are no longer in process list (sometimes I must kill boinc.exe, as processes are not released), then restarting BOINC sometimes resolve and WU is completed OK.
Other times more than one restart of BOINC is needed before the WU is processed.

The next is one of two - either the next WU is processed normally or it is processed for a few seconds, then completed with success(!).
Only option is to restart computer to start the cycle once again.

OS is Windows Server 2008 x64, BOINC 6.6.0 x64. Video drivers tested: 185.20 which I downgraded to 181.22 as part of ap_5.00r103_SSE3 and MB_6.08_mod_VLAR_kill_CUDA upgrade. To no avail it seems.

Below are the WUs during the abovementioned scenario:

1129308489
1129308493
1129308763

Perhaps this is an x64-issue? The following hosts are 32-bit and are seemingly working very well:

2650128
4242105

Morten Ross

Well, most probably it's x64 issue indeed.
I have second CUDA-anabled host with dual GPU cards (8500GT + 9400GT). OS: Wndows Server 2003 x64.
It manifessted the same behavior.

There was memory leak (my build clerly demonstrated that cause it dumps available GPU memory at beginning of task). 1-2 tasks for 8500 and a little more for 9400 could be completed (but they ended with computational error usually) and after that app fall back to CPU processing due to low GPU memory condition.
Only OS restart helped.

But when I upgraded ti 181.20 x64 drivers this problem disappeared. ow it seems this host can crunch OK on both GPUs.
Link to host on beta:
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=18439
ID: 856578 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 856579 - Posted: 22 Jan 2009, 22:55:56 UTC - in response to Message 856397.  

... Additional note: boincmgr.exe is also allways using 100% CPU when the abovementioned scenario occurs, so it seems the interaction between cuda and boincmgr is a problem.

Morten
Morten Ross
ID: 856579 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856580 - Posted: 22 Jan 2009, 22:58:05 UTC - in response to Message 856579.  

... Additional note: boincmgr.exe is also allways using 100% CPU when the abovementioned scenario occurs, so it seems the interaction between cuda and boincmgr is a problem.

Morten

I didn't try BOINC 6.6.0
Use 6.4.5. Maybe you could downgrade to it too to rule our BOINC from this "equation" ?
ID: 856580 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 856582 - Posted: 22 Jan 2009, 23:05:02 UTC - in response to Message 856580.  

I will try a downgrade and see what happens, allthough that might introduce other isses fixed in 6.6.0.

There is also the difference in OS versions - you have Windows 2003 and I've got Windows 2008.

Morten Ross
Morten Ross
ID: 856582 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 856745 - Posted: 23 Jan 2009, 9:19:56 UTC - in response to Message 856582.  

Hi,

I did a downgrade, but unfortunately that did not make any difference - WU will still hang after some time of crunching. There is one change - boincmgr.exe is not using 100% CPU when this happens, and all WUs are released properly when I exit BOINC manager.

Morten Ross
Morten Ross
ID: 856745 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856762 - Posted: 23 Jan 2009, 11:23:22 UTC - in response to Message 856745.  

Well, some positive progress at last :)

What will be if you will use 181.20 driver now ?
ID: 856762 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 856826 - Posted: 23 Jan 2009, 16:06:49 UTC - in response to Message 856762.  
Last modified: 23 Jan 2009, 16:13:17 UTC

Hi,

I have downgraded from 181.22 to 181.20 (from file version 7.15.11.8122 to 7.15.11.8120) - but no change, boincmgr must be restarted for the hanging WU to be processed.....

The WU hangs are allways occurring at the very beginning of the processing - after about 17 seconds (CPU time - not elapsed time) of processing, and between 0,000% and 0,500% progress.

Morten
Morten Ross
ID: 856826 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 856877 - Posted: 23 Jan 2009, 18:10:15 UTC - in response to Message 856826.  

Well, it seems your host is good candidate for using Maik's script. It handles such hang situation AFAIK. Look for it here or on Lunatics site.
ID: 856877 · Report as offensive
Profile BMaytum
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 104
Credit: 4,382,041
RAC: 2
United States
Message 856884 - Posted: 23 Jan 2009, 18:27:23 UTC - in response to Message 856762.  
Last modified: 23 Jan 2009, 19:07:03 UTC

Well, some positive progress at last :)

What will be if you will use 181.20 driver now ?


@ Raistmer:
Well your V7 optimized "MB_6.08-mod_kill_VLAR_CUDA.exe" client is working well for me, using WinXP Pro SP3 32bit with nVidia 181.20 drivers on my single 8800GT/512mb (G92 GPU) video card under BOINC 6.6.0 on S@H main (non-Beta) project. My Tasks are listed here:
Tasks for User ID 7782289

Aside: As I reported here Boinc 6.6.2 just released, my two attempts to use Boinc 6.6.2 with your V7 Cuda client gave me Boinc Manager problems, but that's for others to work out (I guess?). I haven't tried 6.6.2 with stock Berkeley 6.08-Cuda client. I was really hoping 6.6.2 would work because it reportedly should allow concurrent GPU MB crunching and CPU MB/AP crunching. Have you tried Boinc 6.6.2 yet?

I am wondering if the present VLAR Kill feature in your V7 application is actually necessary, because when I earlier was running Boinc 6.4.5 with stock Berkeley 6.08-Cuda and nVidia 180.60 drivers, that stock 6.08-Cuda would successfully crunch MB WUs that had very low angle range (VLAR).
(Aside: Like so many others reported, VLAR WUs crunched with stock Berkeley 6.06-Cuda and 6.07-Cuda would produce Compute Errors).

Another question: When your V7 application detects that a WU angle range is below your VLAR limit, V7 terminates that WU with an exit code -6 so that WU gets listed as a Client Error/Compute Error. For eaxample see:
http://setiathome.berkeley.edu/result.php?resultid=1130457910
Could you instead terminate VLAR WUs with an error code that reports it as "Client Aborted"? That would help differentiate such WUs from WUs that terminate with error code -9 overflow (e.g. #spikes=30, or #spikes=15+#peaks=15). Just a thought....
Sabertooth Z77, i7-3770K@4.2GHz, GTX680, W8.1Pro x64
P5N32-E SLI, C2D E8400@3Ghz, GTX580, Win7SP1Pro x64 & PCLinuxOS2015 x64
ID: 856884 · Report as offensive
Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · Next

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.