Seti cuda not suspending, PC extremely sluggish

Questions and Answers : GPU applications : Seti cuda not suspending, PC extremely sluggish
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Duncan Blues

Send message
Joined: 20 Jan 02
Posts: 5
Credit: 236,021
RAC: 0
Germany
Message 882682 - Posted: 6 Apr 2009, 7:50:45 UTC

For some reason, my running Seti cuda workunits often do not suspend when I resume using the PC (more often than not) even though the Boinc manager says that the work is suspended due to user activity. The progress percentage of the cuda workunit shown in Boinc manager continues to rise while the others I have running like climate prediction or rosetta do stop like they should.
Whenever this happens, the PC is being extremely sluggish. Changing between application windows takes several seconds, the mouse pointer is getting jumpy and even typing text is being slow.
I've configured the Boinc manager to not hold the calculation apps in memory while the user is active but whenever the problem occurs, the Seti process and often also the other apps too do remain loaded (shown in task manager).
I have the latest Boinc manager and also the latest nVidia driver 182.50, video card is a geForce 9800 GT with 512 MB RAM, OS is XP Pro SP3 with all patches applied. PC is an Intel Q8200 with 4 GB RAM. System is properly cooled and not overclocked.
Boinc is configured to use 50% of the CPUs (i.e. 2 cores) and 50% of CPU time.
The system is becoming responsive again a couple of seconds after I end Boinc manager and tell it to also stop running apps or as soon as the current cuda workunit has finished.
Even if I select the cuda app that is running despite user activity and manually suspend the task, it keeps running.
I never had this kind of problem with the non-cuda Seti apps.
ID: 882682 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 882687 - Posted: 6 Apr 2009, 8:35:01 UTC - in response to Message 882682.  

You've given us a lot of information, but unfortunately you've missed out one item which could help.

You say you have "the latest Boinc manager", but you don't give it's actual version number: there have been a lot of "latest" managers recently! And we can't check the exact version number because your computers are hidden from us on this website.

My guess is that you are using v6.4.5 or v6.4.7, the latest "stable", "recommended" BOINC releases. They have fairly rudimentary support for CUDA. There is much better CUDA support in the v6.6.xx range (currently in the final stages of testing with v6.6.20). You should find that these test versions do suspend the CUDA application, and remove it from graphics memory, when your preferences require it.

The other problem, the sluggish response when typing or switching application windows, is a well-known flaw with the current SETI application. It tends to affect only a limited number of SETI tasks (the ones we describe as 'Very Low Angle Range' or VLAR). This problem will be with us until the application developers find a new way of doing the necessary calculations that fits the CUDA environment better, or new tools are released by the hardware suppliers to fix the problem. Unfortunately, not much progress seems to be being made on either of these fronts at the moment.

The best advice I can give at the moment is to install BOINC v6.6.20 (which is very close to being released as a 'recommended' version), and suspend any VLAR tasks which cause problems on your screen until you can finish them off at a less inconvenient time.

[Someone will probably dive in and say that there are all sorts of third-party modified versions which try to alleviate the VLAR problem, but I'm deliberately holding back from giving that response to your very first post on these boards. If you've been reading the other threads, you'll have found that advice anyway]
ID: 882687 · Report as offensive
Profile Duncan Blues

Send message
Joined: 20 Jan 02
Posts: 5
Credit: 236,021
RAC: 0
Germany
Message 882700 - Posted: 6 Apr 2009, 10:53:34 UTC

Thank you for your reply.

My bad, I was talking about the latest recommended v6.4.7 version of the manager, sorry for not including the version number.
I've just installed the v6.6.20 and will keep you updated if the problem persists.

ID: 882700 · Report as offensive
SMR
Volunteer tester

Send message
Joined: 29 Jun 99
Posts: 13
Credit: 2,648,825
RAC: 0
United States
Message 883371 - Posted: 8 Apr 2009, 13:41:18 UTC - in response to Message 882700.  

I have a similar problem, I have a mac mini (ppc 1.25 ghz 1gb memory, 95% time to boinc), running boinc 6.2.18, osx 10.4.11, and any time I have the boinc manager open if takes around 2 seconds to respond to any typing or mouse clicks. When I click away from or close the boinc window, the computer responds normally. my other mac mini (also 1.25GHz ppc w/ 1gb ram) set up the same way, and with same optimized app (I think), works fine. (that has boinc 5.10.45, has been running for quite some time) If anyone looks at this, mini3 is the one w/ problems, our_mini is the one that works. the PC versions all seem fine.
I will try to look at boinc FAQs to see if anyone else has had this problem; and will probably try installing same version of boinc as on other (working) mini after running seti out of work.
ID: 883371 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 883406 - Posted: 8 Apr 2009, 15:10:19 UTC - in response to Message 883371.  

I have a similar problem

Perhaps so, but definitely not CUDA related as a) you need BOINC 6.4 for CUDA support to be enabled while b) a CUDA application is not available yet for the Macintosh.

So you're posting it in the wrong forum and the wrong thread. Best post about it in the Macintosh forum, starting a new thread.
ID: 883406 · Report as offensive
Profile Duncan Blues

Send message
Joined: 20 Jan 02
Posts: 5
Credit: 236,021
RAC: 0
Germany
Message 883911 - Posted: 10 Apr 2009, 12:17:48 UTC

I've been running BOINC manager 6.6.20 for a few days now and I have to report that situation has not significantly improved. Sometimes the work units do get suspended properly but sometimes the CUDA app is totally unfazed and continues to number-crunch along after I come back to the PC. If that happens, the PC is still downright unusable until I manually end the Boinc manager along with all apps.
ID: 883911 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 883916 - Posted: 10 Apr 2009, 12:30:43 UTC - in response to Message 883911.  

If that happens, the PC is still downright unusable until I manually end the Boinc manager along with all apps.

That is most probably due to the very low angle range tasks, for which we hope a newer Seti application will eventually come that will dump those unceremoniously to the CPU, instead of trying to crunch them on the GPU. Not much that BOINC can do about.
ID: 883916 · Report as offensive
Profile Duncan Blues

Send message
Joined: 20 Jan 02
Posts: 5
Credit: 236,021
RAC: 0
Germany
Message 885330 - Posted: 14 Apr 2009, 16:00:17 UTC

Any clue why some of the CUDA WU's still defy the order to suspend themselves?

ID: 885330 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 885338 - Posted: 14 Apr 2009, 16:22:24 UTC - in response to Message 885330.  

If BOINC 6.4.x, then that's the problem. Try updating to 6.6.20, which has a lot of fixes for CUDA included, including the unloading out of video memory when you suspend BOINC (something 6.4 doesn't do).
ID: 885338 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 885734 - Posted: 16 Apr 2009, 5:22:20 UTC - in response to Message 885338.  

If BOINC 6.4.x, then that's the problem. Try updating to 6.6.20, which has a lot of fixes for CUDA included, including the unloading out of video memory when you suspend BOINC (something 6.4 doesn't do).

Well I wish they'd hurry up and fix this vlar problem as one or two of My two gpu cores(a GTX295) has one of the little buggers and right now It looks like the cpus will get done with their WU's long before the gpus will and yeah It almost sounds impossible, But the screen redraws, menus, etc are all being affected. :o And yes I'm using Boinc 6.6.20 and both the Q9300 and the 295 are fully engaged while overclocked, The gpu temps are holding of course.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 885734 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 885749 - Posted: 16 Apr 2009, 6:51:16 UTC - in response to Message 885734.  

Well I wish they'd hurry up and fix this vlar problem

We all do, but I don't think it's going to happen any time soon. Eric Korpela has stated in this post that the official stance on this is to only use CUDA when you're not using the computer.

Since you have that option and a fix for VLAR is difficult or far off, best start using it. The default preference is now also set to use the GPU only when the computer is idle.
ID: 885749 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 885761 - Posted: 16 Apr 2009, 7:38:42 UTC - in response to Message 885749.  

Well I wish they'd hurry up and fix this vlar problem

We all do, but I don't think it's going to happen any time soon. Eric Korpela has stated in this post that the official stance on this is to only use CUDA when you're not using the computer.

Since you have that option and a fix for VLAR is difficult or far off, best start using it. The default preference is now also set to use the GPU only when the computer is idle.

Thankfully It seems like I only have 3 at the moment(I hope at least), It would be nice If the VLARs could be flagged as to angle range as if their VLARs or not, But that may be wishful thinking on My part. I only wish I knew before hand how to tell a VLAR from a non VLAR, Oh well.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 885761 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 885762 - Posted: 16 Apr 2009, 7:44:55 UTC - in response to Message 885761.  

I only wish I knew before hand how to tell a VLAR from a non VLAR, Oh well.

Use the Lunatics modified package with VLAR killer.
ID: 885762 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 885799 - Posted: 16 Apr 2009, 14:55:35 UTC - in response to Message 885762.  

I only wish I knew before hand how to tell a VLAR from a non VLAR, Oh well.

Use the Lunatics modified package with VLAR killer.

I just get computation errors with those, I've tried V9-V11 they all produce error -5 or -6 and I've been told their not interested in anything other than -12 right now. My video driver is 182.08 and I get no errors from stock cuda and I'm running a 64bit AK_v8 for the cpus as well under XP x64 sp2. So I'll just abort them instead.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 885799 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 885806 - Posted: 16 Apr 2009, 15:31:13 UTC - in response to Message 885799.  

I just get computation errors with those, I've tried V9-V11 they all produce error -5 or -6 and I've been told their not interested in anything other than -12 right now.

As far as I understand the working of the VLAR killer, the -5 and -6 are normal errors for what you're doing: You are scanning the tasks as they come in and immediately killing them. So they will be returned as an error, to be sent out to someone else.

I don't say this is the right thing to do though, others call it 'cherry picking', favoring only those tasks you like.
ID: 885806 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 885812 - Posted: 16 Apr 2009, 16:00:18 UTC - in response to Message 885806.  
Last modified: 16 Apr 2009, 16:04:45 UTC

I just get computation errors with those, I've tried V9-V11 they all produce error -5 or -6 and I've been told their not interested in anything other than -12 right now.

As far as I understand the working of the VLAR killer, the -5 and -6 are normal errors for what you're doing: You are scanning the tasks as they come in and immediately killing them. So they will be returned as an error, to be sent out to someone else.

I don't say this is the right thing to do though, others call it 'cherry picking', favoring only those tasks you like.

No the -5 and -6 errors were not cause I killed them, they happened by themselves, Killing VLARs is different as the -5 and -6 errors were not VLARs at all. And they've happened by the hundreds, So I don't consider them normal like a -9 error as they ain't normal to Me.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 885812 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 885815 - Posted: 16 Apr 2009, 16:10:55 UTC - in response to Message 885812.  

LOL, OK. :-)
ID: 885815 · Report as offensive
Profile Duncan Blues

Send message
Joined: 20 Jan 02
Posts: 5
Credit: 236,021
RAC: 0
Germany
Message 887551 - Posted: 23 Apr 2009, 11:36:13 UTC - in response to Message 885749.  


the official stance on this is to only use CUDA when you're not using the computer.

Since you have that option and a fix for VLAR is difficult or far off, best start using it. The default preference is now also set to use the GPU only when the computer is idle.


The problem I am having is, that the PC is starting to work on a CUDA work unit while it is idle. Then I come back to the PC and the BOINC manager claims that all work is suspended but often, it is in fact not suspended on the CUDA wu. The progress percentage keeps going up and the PC is awfully sluggish.

The only possible solutions for me right now are

a) manually shut down Boinc when I come back to the PC and notice it's being sluggish (and hopefully not forget to restart it when I leave the comp again)

b) disable CUDA altogether

(I'm currently using Boinc 6.6.23, no improvement compared to 6.6.20)

Right now, I'm hanging in there and using option a), hoping for a solution to pop up.
ID: 887551 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 887621 - Posted: 23 Apr 2009, 16:00:08 UTC - in response to Message 887551.  
Last modified: 23 Apr 2009, 16:08:38 UTC

Good luck on that, Nvidia from what I've read has left the Seti closet and departed for other tasks and left Seti hanging in the wind. VLAR will remain a problem and there are 3 potential solutions:

1. One some HATE(VLAR killer that generates an error -6 and the Communication Delays that come with -6 errors).

2. One that has to be done when Boinc is shut down and when one has less than 500 WU's in ones Cache or else It won't work and It has to be done everyday and is in a PERL script(No automation yet or integration), lovely.

3. There is the fpops idea somebody put up to keep VLARs out of 6.08, But so far I think the server guys don't want to do that either, Even though they could and It would be ideal as then the VLAR killer wouldn't be needed anymore.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887621 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 887669 - Posted: 23 Apr 2009, 18:10:23 UTC - in response to Message 885812.  

No the -5 and -6 errors were not cause I killed them, they happened by themselves, Killing VLARs is different as the -5 and -6 errors were not VLARs at all. And they've happened by the hundreds, So I don't consider them normal like a -9 error as they ain't normal to Me.


How do you know they were not VLAR's?
The VLAR kill mod kills the VLAR wu's automatically by outputting an error of -5 or -6
It does the killing for you.
If it did not error out like that then it would be possible for 2 cuda machines to validate killed VLAR wu's and put invalid data into the science database.

Bob



Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 887669 · Report as offensive
1 · 2 · Next

Questions and Answers : GPU applications : Seti cuda not suspending, PC extremely sluggish


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.