Panic Mode On (78) Server Problems?

Author	Message
Fred E. Volunteer tester Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0	Message 1303147 - Posted: 7 Nov 2012, 15:29:37 UTC Okay, I didn't pick up on the point you made that 100 worked now. I think I'll stay with a lower numver to increase probability of success in this environment. Also miss Matt's expertise and and funding is the key. I was crunching Orbit@Home when it ran out of money, and it's not pretty. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. ID: 1303147 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1303160 - Posted: 7 Nov 2012, 16:18:14 UTC - in response to Message 1303061. Sorry for the late response: Scheduler was modified a few weeks ago to accept a max of 64... It was actually months ago (May), and the figure was increased relatively quickly to accept 256 tasks reported at once. ID: 1303160 ·

BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0	Message 1303215 - Posted: 7 Nov 2012, 18:37:44 UTC - in response to Message 1303160. I believe others have been reporting the scheduler problem -- here's what I see. No problems with uploads. No problems with reporting IF I have set no new work. Big problem with reporting and getting work if I have not set no new work. The scheduler goes into suspended animation -- it takes 50 to 10 minutes for the scheduler to time out and release that action. The report back is 'Timeout was reached, servers may be down. They are not, but the scheduler is in trouble as it has been for the past week. My approach at this point, pending some confirmation back at the shop that this problem -- which others have reported from what I've read -- and that a fix is in view is simply to have all my SETI systems configured for no new work and let them complete and clear out and have other projects pick up the slack. I am certain that once folks acknowledge the problem and work on it, we'll get some information regarding anticipated resolution. I would note I seen this problem during the past week or so on multiple systems running multiple different versions of the BOINC software. ID: 1303215 ·

S@NL Etienne Dokkum Volunteer tester Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0	Message 1303224 - Posted: 7 Nov 2012, 18:57:30 UTC besides the max. number of tasks set anyone else noticed the following or is it just me : Just got 1 task, an AP. Nothing strange there but it got an ETA of 4996:28:06 hours weird as even the laptop it runs on normally crunches it away in under 15 hours ID: 1303224 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1303227 - Posted: 7 Nov 2012, 19:00:53 UTC - in response to Message 1303215. Big problem with reporting and getting work if I have not set no new work. The scheduler goes into suspended animation -- it takes 50 to 10 minutes for the scheduler to time out and release that action. The report back is 'Timeout was reached, servers may be down. They are not, but the scheduler is in trouble as it has been for the past week. There is some general confusion over the timeout message, and I'm not surprised: even the project staff were caught out by this one. In fact, you see the timeout message when the boinc client - your own computer - decides that nothing is coming back from the server, and gives up - it stops listening. The full message includes lines like 04/11/2012 22:13:51 \| SETI@home \| [sched_op] NVIDIA GPU work request: 43326.80 seconds; 0.00 GPUs 04/11/2012 22:19:00 \| \| [http] [ID#1] Info: Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds 04/11/2012 22:19:00 \| \| [http] [ID#1] Info: Closing connection #0 04/11/2012 22:19:00 \| \| [http] HTTP error: Timeout was reached 04/11/2012 22:19:01 \| SETI@home \| Scheduler request failed: Timeout was reached The actual timeout value - as in the 'too slow' line above - is 300 seconds or five minutes, but from what I've seen, some other activity (maybe a task finishing and uploading) can fool the communications subsystem into thinking that something is happening after all, and it allows some extra time. The root cause of the problem is, obviously, the server taking too long to work out which tasks to send and assemble them into a suitable reply to your request. But there's no simple switch on the server to say 'extend the time limit': we'll just have to wait for them to uncover the root cause, and fix that instead. ID: 1303227 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 1303228 - Posted: 7 Nov 2012, 19:02:18 UTC - in response to Message 1303061. Last modified: 7 Nov 2012, 19:17:36 UTC It appears I don't have a cc_config.xml on my system already. I believe it goes in programdata\boinc (I'm running Win 7 64-bit) but it didn't seem to help (yes I shut the BOINC Manager down and restarted it). I'll paste the cc_config.xml below to get an opinion if I built it right. <cc_config> <options> <max_tasks_reported>100</max_tasks_reported> </options> </cc_config> That looks okay - and yes, it belongs on the top level data dirctory, not one of the project directories or the program directory. Scheduler was modified a few weeks ago to accept a max of 64, so there's little point in using higher values unless you run other projects and need it there. Project is running inconsistently. Looking at my log for last 8 hours, I see failure to connect, timeouts, no tasks available, over the limit messages, and I got 18 cpu tasks overnight. I couldn't get any for 24 hours before that. I'm still below limit for cpu work and down to 99 cpu tasks for 6 cores. Could be a problem here, so I'm looking for corroboration that Scheduler is refusing you when you're certain that you are below the limit (50/cpu core and 400/gpu). Hard to tell with the variety of responses and connection issues. Thanks. I didn't think the cc_config file would do anything in my case considering I'm seeing the same symptoms as last week. That it appears that the uploaded units are being reported and processed OK, it's just that the client isn't getting the reply, or new the new units being assigned. Unless of course I to a schedule request with NNT and then those "ready to report" marked units are wiped from my tasks tab. Also Ghost Detector was failing with the message "Hmm, Server indicates less Work Units 'In Progress' than client_state.xml thinks you have on board ... Aborted". I assume that's due to the syncing error. Once the done units were cleared out with NNT/report it ran fine and surprise, ghosts once again. Edit: Oh, and I'm running around 300-340 pending with 3 CPU and 1 GPU task running when the schedule Nazi said "no more units for you" (Seinfeld joke variant, US TV comedy series for those who aren't familiar, I'm talking about the "This computer has reached a limit on tasks in progress" message). Split was around 90 for the CPU and the rest GPU but something like 95% of the GPU ones were shorties. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 1303228 ·

BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0	Message 1303251 - Posted: 7 Nov 2012, 19:41:52 UTC - in response to Message 1303227. I don't want to have it extend the time limit -- I have other projects going and when the scheduler does this, it preempts other projects reporting for the duration. I'm hoping that the root cause is acknowledged, identified and dealt with -- until then, I'm completing SETI work on hand and getting and processing work from other projects. The root cause of the problem is, obviously, the server taking too long to work out which tasks to send and assemble them into a suitable reply to your request. But there's no simple switch on the server to say 'extend the time limit': we'll just have to wait for them to uncover the root cause, and fix that instead. ID: 1303251 ·

Michael W.F. Miles Send message Joined: 24 Mar 07 Posts: 268 Credit: 34,410,870 RAC: 0	Message 1303256 - Posted: 7 Nov 2012, 20:03:07 UTC It would seem that these server troubles started right when the New York disaster took place. I am not sure if this is a direct problem to what is going on with the scheduler but I hope it gets fixed very soon. The limits imposed this time around make it very hard as just reporting finished tasks is a problem as well as getting enough work to feed 6 cpu cores and one GTX 460 for one day when I have am supposed to have enough work to keep all happy for 3 days according to my prefs settings. ID: 1303256 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1303270 - Posted: 7 Nov 2012, 20:43:13 UTC Last modified: 7 Nov 2012, 20:46:13 UTC I just had a thought, but I don't know how feasible it is or whether it might help anything. Would it help if they turn off downloads for five minutes every hour or half hour to allow uploads, reports, and miscellaneous traffic to get through unimpeded? They'd probably have to even stop ghost resends for this to work, but if it works it should dramatically reduce the number of ghosts after a few days. edit: Even better would be to turn off uploads except during that five minute period, but to have any real effect on the network it would probably have to be a mod to the client software so it wouldn't try to send when it shouldn't. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1303270 ·

Philhnnss Volunteer tester Send message Joined: 22 Feb 08 Posts: 63 Credit: 30,694,327 RAC: 162	Message 1303271 - Posted: 7 Nov 2012, 20:44:31 UTC 11/7/2012 2:32:31 PM SETI@home Not requesting tasks: some download is stalled OK, I can kinda understand setting limits. That way more people can get work. But the above I just don't get. Now you need to babysit your systems to make sure all your downloads run as they should, or you can not have any work? I am really starting to understand why people are becoming upset. This was supposed to be a set and forget system wasn't it? ID: 1303271 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1303274 - Posted: 7 Nov 2012, 20:50:51 UTC - in response to Message 1303271. 11/7/2012 2:32:31 PM SETI@home Not requesting tasks: some download is stalled OK, I can kinda understand setting limits. That way more people can get work. But the above I just don't get. Now you need to babysit your systems to make sure all your downloads run as they should, or you can not have any work? I am really starting to understand why people are becoming upset. This was supposed to be a set and forget system wasn't it? The behaviour is unchanged - the BOINC client has never requested work when some download is stalled. The trouble is, the old (silent) way generated thread after thread saying 'SETI isn't sending me any work'. Wrong. The computer wasn't requesting work, but the user didn't know it. Now, with the additional messages (there are several of them), you can see at a glance what the reason for the drought is, and decide whether it's one you can (or want to) do something about yourself. ID: 1303274 ·

Philhnnss Volunteer tester Send message Joined: 22 Feb 08 Posts: 63 Credit: 30,694,327 RAC: 162	Message 1303277 - Posted: 7 Nov 2012, 20:59:10 UTC - in response to Message 1303274. I guess I never got the old messages. It just upset me last night before I went to work I had about 4 hours worth of cache built up. Thought that was good. It would get more work as it ran. I guess as soon as I left, one stalled. So when I checked after I got home, nothing. Aborted that download and started over. Now I got it again. Frustrating!!It just seems like such a waste to leave my systems on and not have any work because I can not babysit them 24/7 ID: 1303277 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1303281 - Posted: 7 Nov 2012, 21:06:22 UTC - in response to Message 1303277. I guess I never got the old messages... You couldn't get the old messages, because there haven't been any (as Richard just stated ;-). GruÃŸ, Gundolf ID: 1303281 ·

Philhnnss Volunteer tester Send message Joined: 22 Feb 08 Posts: 63 Credit: 30,694,327 RAC: 162	Message 1303283 - Posted: 7 Nov 2012, 21:10:36 UTC - in response to Message 1303281. I guess I never got the old messages... You couldn't get the old messages, because there haven't been any (as Richard just stated ;-). GruÃŸ, Gundolf Opp's didn't catch that. Sorry, just frustrated and venting. I'm good now, LOL!! ID: 1303283 ·

Mad Fritz Send message Joined: 20 Jul 01 Posts: 87 Credit: 11,334,904 RAC: 0	Message 1303329 - Posted: 7 Nov 2012, 23:21:49 UTC Just let me know if that project will ever working again... ATM I'am sick of it ID: 1303329 ·

fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0	Message 1303336 - Posted: 7 Nov 2012, 23:51:12 UTC :)...This reminds me of a song from HeeHaw. Gloom despair and agony on me Deep dark depression excessive misery. ID: 1303336 ·

bluestar Send message Joined: 5 Sep 12 Posts: 7011 Credit: 2,084,789 RAC: 3	Message 1303498 - Posted: 8 Nov 2012, 13:25:00 UTC Last modified: 8 Nov 2012, 13:25:19 UTC Got a computational error on one of my CUDA tasks. This because I was out shopping for the weekend. Maybe it should be wise to not let CUDA tasks run if you are away from your computer? Perhaps not so easy. You may happen to get new tasks. Unless you have suspended some tasks, those CUDA tasks which are either "Ready to Start" or "Waiting to Run" will start running automatically after 3 minutes of keyboard or mouse inactivity with the default settings in place. I guess in this project things are never going to become perfect in its workings. ID: 1303498 ·

Fred E. Volunteer tester Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0	Message 1303507 - Posted: 8 Nov 2012, 14:06:57 UTC Got a computational error on one of my CUDA tasks. This because I was out shopping for the weekend. Maybe it should be wise to not let CUDA tasks run if you are away from your computer? Perhaps not so easy. You may happen to get new tasks. Unless you have suspended some tasks, those CUDA tasks which are either "Ready to Start" or "Waiting to Run" will start running automatically after 3 minutes of keyboard or mouse inactivity with the default settings in place. I guess in this project things are never going to become perfect in its workings. Easy fix - don't shop on weekends! :=} Don't worry about that one. It's a -12 error due to the application and data. These are usually regarded as no-fault. I often have one or two in my errors list. On your website task page, you can drill down on the task to see the Std_Err report, which includes this: SETI@home error -12 Unknown error cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel Don't do anything special. Most of us run unattended and check it periodically. If you encounter more errors or just have some questions, feel free to open your own thread and ask for help. We usually use this one for comments and observations on how the project is running. Welcome to the project! Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. ID: 1303507 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1303519 - Posted: 8 Nov 2012, 14:40:07 UTC - in response to Message 1303498. Got a computational error on one of my CUDA tasks. This because I was out shopping for the weekend. Maybe it should be wise to not let CUDA tasks run if you are away from your computer? Perhaps not so easy. You may happen to get new tasks. Unless you have suspended some tasks, those CUDA tasks which are either "Ready to Start" or "Waiting to Run" will start running automatically after 3 minutes of keyboard or mouse inactivity with the default settings in place. I guess in this project things are never going to become perfect in its workings. If you desire to suspend GPU computing while you are away from your computer you can tell BOINC to do that. There are options form the manager for "snooze" and "snooze GPU". However they are only for 1 hour IIRC. If you wanted to suspend GPU processing while you are out for a few hours there is a command line option. If you wanted to suspend GPU processing for 4 hours you could use. boinccmd --set_gpu_mode never 14400 Then after 4 hours BOINC would resume the previous state you had the GPU processing set. Normally it would be "auto" or "based on preferences". I just leave the GPU to do what it will. Sometimes there will be an error which I have no control over & the GPU trashes several hundred tasks. I just restart the machine when I find there is an issue & make sure everything is OK afterwards. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1303519 ·

bluestar Send message Joined: 5 Sep 12 Posts: 7011 Credit: 2,084,789 RAC: 3	Message 1303538 - Posted: 8 Nov 2012, 15:19:27 UTC Last modified: 8 Nov 2012, 15:23:00 UTC Thanks for those comments both of you! Good catch there, but not that task anyway. I just had a dream about watching some tasks dated 1999 -> by means of some tapes having being split and run. I have been able to locate some tasks dated 2005 on the three discs that are in my computer right now. The rest of it will have to wait for a couple more weeks or months to get back to the surface. Perhaps I missed something there which I should have been able to take a note of. ID: 1303538 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.