Questions and Answers :
GPU applications :
Seti cuda not suspending, PC extremely sluggish
Message board moderation
Author | Message |
---|---|
Duncan Blues Send message Joined: 20 Jan 02 Posts: 5 Credit: 236,021 RAC: 0 |
For some reason, my running Seti cuda workunits often do not suspend when I resume using the PC (more often than not) even though the Boinc manager says that the work is suspended due to user activity. The progress percentage of the cuda workunit shown in Boinc manager continues to rise while the others I have running like climate prediction or rosetta do stop like they should. Whenever this happens, the PC is being extremely sluggish. Changing between application windows takes several seconds, the mouse pointer is getting jumpy and even typing text is being slow. I've configured the Boinc manager to not hold the calculation apps in memory while the user is active but whenever the problem occurs, the Seti process and often also the other apps too do remain loaded (shown in task manager). I have the latest Boinc manager and also the latest nVidia driver 182.50, video card is a geForce 9800 GT with 512 MB RAM, OS is XP Pro SP3 with all patches applied. PC is an Intel Q8200 with 4 GB RAM. System is properly cooled and not overclocked. Boinc is configured to use 50% of the CPUs (i.e. 2 cores) and 50% of CPU time. The system is becoming responsive again a couple of seconds after I end Boinc manager and tell it to also stop running apps or as soon as the current cuda workunit has finished. Even if I select the cuda app that is running despite user activity and manually suspend the task, it keeps running. I never had this kind of problem with the non-cuda Seti apps. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
You've given us a lot of information, but unfortunately you've missed out one item which could help. You say you have "the latest Boinc manager", but you don't give it's actual version number: there have been a lot of "latest" managers recently! And we can't check the exact version number because your computers are hidden from us on this website. My guess is that you are using v6.4.5 or v6.4.7, the latest "stable", "recommended" BOINC releases. They have fairly rudimentary support for CUDA. There is much better CUDA support in the v6.6.xx range (currently in the final stages of testing with v6.6.20). You should find that these test versions do suspend the CUDA application, and remove it from graphics memory, when your preferences require it. The other problem, the sluggish response when typing or switching application windows, is a well-known flaw with the current SETI application. It tends to affect only a limited number of SETI tasks (the ones we describe as 'Very Low Angle Range' or VLAR). This problem will be with us until the application developers find a new way of doing the necessary calculations that fits the CUDA environment better, or new tools are released by the hardware suppliers to fix the problem. Unfortunately, not much progress seems to be being made on either of these fronts at the moment. The best advice I can give at the moment is to install BOINC v6.6.20 (which is very close to being released as a 'recommended' version), and suspend any VLAR tasks which cause problems on your screen until you can finish them off at a less inconvenient time. [Someone will probably dive in and say that there are all sorts of third-party modified versions which try to alleviate the VLAR problem, but I'm deliberately holding back from giving that response to your very first post on these boards. If you've been reading the other threads, you'll have found that advice anyway] |
Duncan Blues Send message Joined: 20 Jan 02 Posts: 5 Credit: 236,021 RAC: 0 |
Thank you for your reply. My bad, I was talking about the latest recommended v6.4.7 version of the manager, sorry for not including the version number. I've just installed the v6.6.20 and will keep you updated if the problem persists. |
SMR Send message Joined: 29 Jun 99 Posts: 13 Credit: 2,648,825 RAC: 0 |
I have a similar problem, I have a mac mini (ppc 1.25 ghz 1gb memory, 95% time to boinc), running boinc 6.2.18, osx 10.4.11, and any time I have the boinc manager open if takes around 2 seconds to respond to any typing or mouse clicks. When I click away from or close the boinc window, the computer responds normally. my other mac mini (also 1.25GHz ppc w/ 1gb ram) set up the same way, and with same optimized app (I think), works fine. (that has boinc 5.10.45, has been running for quite some time) If anyone looks at this, mini3 is the one w/ problems, our_mini is the one that works. the PC versions all seem fine. I will try to look at boinc FAQs to see if anyone else has had this problem; and will probably try installing same version of boinc as on other (working) mini after running seti out of work. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I have a similar problem Perhaps so, but definitely not CUDA related as a) you need BOINC 6.4 for CUDA support to be enabled while b) a CUDA application is not available yet for the Macintosh. So you're posting it in the wrong forum and the wrong thread. Best post about it in the Macintosh forum, starting a new thread. |
Duncan Blues Send message Joined: 20 Jan 02 Posts: 5 Credit: 236,021 RAC: 0 |
I've been running BOINC manager 6.6.20 for a few days now and I have to report that situation has not significantly improved. Sometimes the work units do get suspended properly but sometimes the CUDA app is totally unfazed and continues to number-crunch along after I come back to the PC. If that happens, the PC is still downright unusable until I manually end the Boinc manager along with all apps. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
If that happens, the PC is still downright unusable until I manually end the Boinc manager along with all apps. That is most probably due to the very low angle range tasks, for which we hope a newer Seti application will eventually come that will dump those unceremoniously to the CPU, instead of trying to crunch them on the GPU. Not much that BOINC can do about. |
Duncan Blues Send message Joined: 20 Jan 02 Posts: 5 Credit: 236,021 RAC: 0 |
Any clue why some of the CUDA WU's still defy the order to suspend themselves? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
If BOINC 6.4.x, then that's the problem. Try updating to 6.6.20, which has a lot of fixes for CUDA included, including the unloading out of video memory when you suspend BOINC (something 6.4 doesn't do). |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
If BOINC 6.4.x, then that's the problem. Try updating to 6.6.20, which has a lot of fixes for CUDA included, including the unloading out of video memory when you suspend BOINC (something 6.4 doesn't do). Well I wish they'd hurry up and fix this vlar problem as one or two of My two gpu cores(a GTX295) has one of the little buggers and right now It looks like the cpus will get done with their WU's long before the gpus will and yeah It almost sounds impossible, But the screen redraws, menus, etc are all being affected. :o And yes I'm using Boinc 6.6.20 and both the Q9300 and the 295 are fully engaged while overclocked, The gpu temps are holding of course. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Well I wish they'd hurry up and fix this vlar problem We all do, but I don't think it's going to happen any time soon. Eric Korpela has stated in this post that the official stance on this is to only use CUDA when you're not using the computer. Since you have that option and a fix for VLAR is difficult or far off, best start using it. The default preference is now also set to use the GPU only when the computer is idle. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
Well I wish they'd hurry up and fix this vlar problem Thankfully It seems like I only have 3 at the moment(I hope at least), It would be nice If the VLARs could be flagged as to angle range as if their VLARs or not, But that may be wishful thinking on My part. I only wish I knew before hand how to tell a VLAR from a non VLAR, Oh well. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I only wish I knew before hand how to tell a VLAR from a non VLAR, Oh well. Use the Lunatics modified package with VLAR killer. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
I only wish I knew before hand how to tell a VLAR from a non VLAR, Oh well. I just get computation errors with those, I've tried V9-V11 they all produce error -5 or -6 and I've been told their not interested in anything other than -12 right now. My video driver is 182.08 and I get no errors from stock cuda and I'm running a 64bit AK_v8 for the cpus as well under XP x64 sp2. So I'll just abort them instead. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I just get computation errors with those, I've tried V9-V11 they all produce error -5 or -6 and I've been told their not interested in anything other than -12 right now. As far as I understand the working of the VLAR killer, the -5 and -6 are normal errors for what you're doing: You are scanning the tasks as they come in and immediately killing them. So they will be returned as an error, to be sent out to someone else. I don't say this is the right thing to do though, others call it 'cherry picking', favoring only those tasks you like. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
I just get computation errors with those, I've tried V9-V11 they all produce error -5 or -6 and I've been told their not interested in anything other than -12 right now. No the -5 and -6 errors were not cause I killed them, they happened by themselves, Killing VLARs is different as the -5 and -6 errors were not VLARs at all. And they've happened by the hundreds, So I don't consider them normal like a -9 error as they ain't normal to Me. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
LOL, OK. :-) |
Duncan Blues Send message Joined: 20 Jan 02 Posts: 5 Credit: 236,021 RAC: 0 |
The problem I am having is, that the PC is starting to work on a CUDA work unit while it is idle. Then I come back to the PC and the BOINC manager claims that all work is suspended but often, it is in fact not suspended on the CUDA wu. The progress percentage keeps going up and the PC is awfully sluggish. The only possible solutions for me right now are a) manually shut down Boinc when I come back to the PC and notice it's being sluggish (and hopefully not forget to restart it when I leave the comp again) b) disable CUDA altogether (I'm currently using Boinc 6.6.23, no improvement compared to 6.6.20) Right now, I'm hanging in there and using option a), hoping for a solution to pop up. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
Good luck on that, Nvidia from what I've read has left the Seti closet and departed for other tasks and left Seti hanging in the wind. VLAR will remain a problem and there are 3 potential solutions: 1. One some HATE(VLAR killer that generates an error -6 and the Communication Delays that come with -6 errors). 2. One that has to be done when Boinc is shut down and when one has less than 500 WU's in ones Cache or else It won't work and It has to be done everyday and is in a PERL script(No automation yet or integration), lovely. 3. There is the fpops idea somebody put up to keep VLARs out of 6.08, But so far I think the server guys don't want to do that either, Even though they could and It would be ideal as then the VLAR killer wouldn't be needed anymore. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
popandbob Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0 |
No the -5 and -6 errors were not cause I killed them, they happened by themselves, Killing VLARs is different as the -5 and -6 errors were not VLARs at all. And they've happened by the hundreds, So I don't consider them normal like a -9 error as they ain't normal to Me. How do you know they were not VLAR's? The VLAR kill mod kills the VLAR wu's automatically by outputting an error of -5 or -6 It does the killing for you. If it did not error out like that then it would be possible for 2 cuda machines to validate killed VLAR wu's and put invalid data into the science database. Bob Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.