Message boards :
Number crunching :
F cuda.............
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
really doesn't seem like it's possible it's a CUDA problem since you're still using Boinc 5.10.45 You are getting "too many normally harmless exit" which if I'm not mistaken might have to do with cpu throttling. In your Boinc Manger under processor usage do you have cpu time set to anything but 100%? |
dragon1 Send message Joined: 17 Sep 05 Posts: 33 Credit: 4,438,013 RAC: 0 |
None of my preferences have been changed since this machine first went online last spring, and current setting is still "100% of the processors" and 98% of CPU time...nothing has been altered at all in the past 8-9 months. This is a Core 2 Duo...guess you already have access to that info..lol. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmm... After looking over a few of the errored out tasks for your host it looks like you are having CC heartbeat problems on this host. In general terms, this means that something is causing the CC to 'stall' and not send heartbeat signals to the science app in a timely fashion. This causes the science app to think the CC died, and they are designed to exit on their when this happens to prevent them from remaining to run as orphaned processes. There are several things which can cause this. The first thing I'd try would be to set the CPU throttle to 100%. There have been reports of the throttle causing some problems for some hosts with the later 5.10 CC's. The next thing to look at is to exclude the BOINC directory and all its subfolders from virus scanning. Another possibility is if you have the host set up for 'remote' monitoring and control. If so you can try increasing the 'polling' interval with which the tools requests status/state updates. If you have the interval set too short for the specific circumstances on the host, this can lead to the aforementioned CC stalls, and too many of those will result in the task aborting like you have been observing. One last, sort of desperation, thing to do is to defrag the drive where BOINC lives. I know this sounds like groping, but has solved weird BOINC malfunctions in the past a couple of times, when everything else I had tried failed and I was fresh out of ideas. ;-) HTH, Alinator |
cowboy Send message Joined: 2 Aug 08 Posts: 51 Credit: 18,580 RAC: 0 |
If you don't want to run a CUDA app, go into your account, click on Seti@home Preferences, edit the preferences to where your computer is, weather Home/Work/School, ect, change the setting of "Use Graphics Processing Unit (GPU) if available" to no, save your changes, open BOINC Manager, click on your SETI project, click update, and you should no longer get CUDA work. Easy way to opt out of running it until its fixed. I fully understand that, in which my point still stands. If you don't want to run it while there are still issues, opt out of it. Matt posted in the Tech News thread that they found at least 1 bug in the validator to fix the issue about the cuda and non-cuda results not matching, so work is being done to fix the issue. No one is being alienated at all. |
dragon1 Send message Joined: 17 Sep 05 Posts: 33 Credit: 4,438,013 RAC: 0 |
Here's something I was just told...about 14 hrs ago the system was actually told to shut off by the operator and it failed to do so and when I come by an hr or so ago the screen was displaying the Windows XP error box saying that "svchost.exe" was not responding and the system had obviously been that way all day with no attention...could that have eliminated all the WU's in the que as it continued to try each one in succession and failed something? hence and empty que now? I have no idea as to how to change or check CPU to 100% throttle, but will run a defrag now. Guess there's no way to get anymore WU's for now. Thanks so far. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Here's something I was just told...about 14 hrs ago the system was actually told to shut off by the operator and it failed to do so and when I come by an hr or so ago the screen was displaying the Windows XP error box saying that "svchost.exe" was not responding and the system had obviously been that way all day with no attention...could that have eliminated all the WU's in the que as it continued to try each one in succession and failed something? hence and empty que now? Yep, that could definitely do it. Especially if it was a network related service running through the Services and Controller Application (the user friendly name for svchost) which had gagged. Alinator |
dragon1 Send message Joined: 17 Sep 05 Posts: 33 Credit: 4,438,013 RAC: 0 |
Ok...thanks mate. Then I guess I'll leave my system up and running BOINC and see if and when it will start sending me some WU's. I guess it's not been happy with me sending too many Client Errors to SETI in one day so they shut me down for awhile? |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Ok...thanks mate. Then I guess I'll leave my system up and running BOINC and see if and when it will start sending me some WU's. I guess it's not been happy with me sending too many Client Errors to SETI in one day so they shut me down for awhile? Yep, the project side has two cutoffs to slow down 'renegade' hosts. The basic quota limit is 100 tasks per quota day per core up to 4 cores (last I knew), or 400 tasks total per quota day. A quota day is defined as from midnight to midnight Pacific time for SAH. The first cutoff is the quota is reduced by one for every errored out task the host returns. The project will double the current value for every successful task returned (this does not mean validates for credit though, an important distinction). This one is intended to shutdown a host which goes 'insane' and isn't noticed right off, like the situation you had on your host for example. The second cutoff is intended to help keep the overall work in progress load on the backend within reason by limiting how much work any host can download in a given day (in theory). This one hardly ever came into play for most participants until recently, since there weren't many folks crunching who could afford hosts which could sustain a throughput of 400 tasks per day. ;-) However it is possible for quads and higher to reach the 400 tasks per day limit when they are filling their cache initially or after prolonged outages if you have the cache settings set high enough. Of course when the 'true' 8 core (16 virtual) i7's are launched we could be seeing mainstream hosts hit the absolute daily limit a lot more. HTH, Alinator |
dragon1 Send message Joined: 17 Sep 05 Posts: 33 Credit: 4,438,013 RAC: 0 |
Thanks for the quick response. Not sure how to interpret "The first cutoff is the quota is reduced by one for every errored out task the host returns"...so I'll just leave everything running for a few days and see what happens. There is currently nothing in the "task" listing waiting to run or running. I'll just wait. Thanks again. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Thanks for the quick response. Not sure how to interpret "The first cutoff is the quota is reduced by one for every errored out task the host returns"...so I'll just leave everything running for a few days and see what happens. There is currently nothing in the "task" listing waiting to run or running. I'll just wait. You start off at 100. If you have one 'compute error', or any other error (including an abort, or a missed deadline), your quota is now 99/cpu/day. If this happens a few more times, it continues to decrease by one for every problem. Now say it's down to 50/cpu/day. If you turn in one good result, it goes up by two, so now it is 52. If you turn in 10 good tasks all at the same time, it goes up 20, and so on. So, one bad task decreases the quota by one. One good task increases the quota by two. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Thanks for the quick response. Not sure how to interpret "The first cutoff is the quota is reduced by one for every errored out task the host returns"...so I'll just leave everything running for a few days and see what happens. There is currently nothing in the "task" listing waiting to run or running. I'll just wait. Hmmm... Unless there's been a change I'm not aware of, good returns double the current quota value (up to the maximum). So if you had enough errors in a row to drop the quota to 50, the next successful task would return the quota to 100. Or put another way if you were at 1 for the quota, each good return would double it, so it would go: 1-2-4-8-16-32-64-100 IOW's, seven straight successes gets you back to the max. Alinator |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.