Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation
Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 52 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
I don't think 77/sec is too bad, under the circumstances. I'm getting work, hot from the oven.Much better than the 0/s, then 5-11/s it was prior to that. Hopefully it will now sustain that output, and not just fall over yet again. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Well, they haven't fallen over, but their output has fallen away significantly.I don't think 77/sec is too bad, under the circumstances. I'm getting work, hot from the oven.Much better than the 0/s, then 5-11/s it was prior to that. Hopefully it will now sustain that output, and not just fall over yet again. The Ready-to-send buffer hit 100k for a while there, but now it's on it's way back down towards zero again. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
My problem was that several machines - untouched while I slept - had run completely dry and stopped asking. Once I triggered their first requests manually, and they got a little work, the automatic processes kicked in and they kept asking until 'full' (which isn't very much on my settings). Now I've backed away from the trough so others can take their turn. That's probably an experience shared across much of of the European time-zone. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Well, they haven't fallen over, but their output has fallen away significantly.I spoke too soon. Splitter output back to 0, Ready-to-send down to 181. A very rocky recovery. Grant Darwin NT |
Retvari Zoltan Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230 |
The doubled maximum number of CPU workunits per host and the tripled maximum number of GPU workunits per GPU results in greater swings in these numbers.Well, they haven't fallen over, but their output has fallen away significantly.I spoke too soon. Splitter output back to 0, Ready-to-send down to 181. Especially when the users woke up (nearly at the same time) and realize that their hosts run dry, so they press the "update" button nearly at the same time to resolve this situation. I expect that future recoveries will go the same way. Perhaps increasing the max allowed GPU tasks further could make the recovery easier on the servers, provided that the users won't press the update button when they still have some work queued during / right after the outage. Lots of pending uploads and unreported tasks can also trigger the user to press the update button, so such increase could make the recovery worse. |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
For some reason, slower machines have an easier time getting new WUs when the server has issues like this, so I worry that upping the limits will just fill the caches (probably set too large) of the slow machines and do nothing to help the faster (have run dry) machines. It would be nice if the server could set a "recovery" switch and until the RTS queue is over some "amount" then each machine could only ask for new WUs if it had less than "number" in its cache. Once the server had over the "amount" in RTS then the "recovery" switch would be turned off, and personal settings for cache size would kick in. "number" could be based on CPU and GPU amounts but at a smaller than normal setting, so that everyone can have some versus some having full caches and some having none. I love the new larger cache sizes and I'm trying to just go NNT on Tuesdays. |
Joseph Stateson Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 |
If a problem like this arises during the next WOW event, then anyone bunkering up tasks ahead of time will get far ahead of the usual crowd. I am planning for this to happen ;<) C:\src\BoincMasterSlave\win_build\Build\x64\Release>boinc --help The command-line options for boinc are intended for debugging. The recommended command-line interface is a separate program,'boinccmd'. Run boinccmd in the same directory as boinc. Usage: boinc [options] --abort_jobs_on_exit when client exits, abort and report jobs --allow_remote_gui_rpc allow remote GUI RPC connections --allow_multiple_clients allow >1 instances per host --attach_project <URL> <key> attach to a project --set_hostname <name> use this as hostname --set_password <password> rpc gui password --set_backoff N set backoff to this value --spoof_gpus N fake number of gpus --set_bunker_cnt <project> N bunker this many workunits for given project then quit --bunker_time_string <text> unix time cutoff for reporting - used with bunker in this format exactly: "11/24/2019T10:41:29" --mw_bug_fix delay attaching output to allow new work to download --check_all_logins for idle detection, check remote logins too --daemon run as daemon (Unix) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
If a problem like this arises during the next WOW event, then anyone bunkering up tasks ahead of time will get far ahead of the usual crowd. I am planning for this to happen ;<) . . Not a fan of bunkering ... Stephen :) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The machines haven't been able to contact the Server for a while now. Completed tasks are backing up quickly. Has it died again? |
Wiggo Send message Joined: 24 Jan 00 Posts: 36782 Credit: 261,360,520 RAC: 489 |
I can report, but nothing is coming back and the forums are like molasses. Cheers. |
Dr Who Fan Send message Joined: 8 Jan 01 Posts: 3343 Credit: 715,342 RAC: 4 |
Servers must be struggling - tried to load forum on my cell using WiFi connection and it timed out, went to desktop PC and took about1 minute to load and get to screen to post this comment. Will see how long it takes to post. Edit..... Over1 minute to post and show comment. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
No longer able to report work nor get any. "Scheduler request failed: HTTP internal server error" or "Scheduler request failed: Couldn't connect to server" errors. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36782 Credit: 261,360,520 RAC: 489 |
Same here now. :-( Cheers. |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
can we go back to the smaller personal caches, but a stable server with 3 hour maintenance window?? hope things are fixed tomorrow morning (california time is now 9:35pm). |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Looks like it's struggling back to life, at least to the extent that I've been able to report some work. No downloads as yet. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Same here. Finally able to report, but all I get back is Project has No tasks... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Same here. Finally able to report, but all I get back is Project has No tasks...Still HTTP server errors here. Looking forward to "Project has no task available" messages as at least i'll have made contact with the Scheduler and cleared all the work that's waiting to be reported. Edit- now starting to make contact with the Scheduler, and yes "Project has no tasks available" is the response, with the extremely occasional allocation of some work. At least there's a nice huge Ready-to-send buffer for when the Scheduler is working again & is prepared to send out work. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
can we go back to the smaller personal caches, but a stable server with 3 hour maintenance window??That's assuming that this is a result of the increased server load. Even before they increased the serverside limits, the servers had been quite variable in their performance, just not bad enough for users to notice. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Looking at my log, the problems started just over 4.5 hours ago (12:17hrs my time, currently 16:54hrs). Initially it was "Project has no tasks available" responses, then after 30min of that, is when the Scheduler went MIA. 19/12/2019 12:46:55 | SETI@home | Scheduler request failed: Failure when receiving data from the peer 19/12/2019 12:56:14 | SETI@home | Scheduler request failed: Couldn't connect to server 19/12/2019 12:57:52 | SETI@home | Scheduler request failed: Couldn't connect to server 19/12/2019 13:09:46 | SETI@home | Scheduler request failed: Couldn't connect to server 19/12/2019 13:20:33 | SETI@home | Scheduler request failed: Failure when receiving data from the peer 19/12/2019 14:53:18 | SETI@home | Scheduler request failed: HTTP internal server error etc, etc Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Over 1 million WUs ready-to-send, and I can't get any. Should be out of GPU work on my Linux system in the next 30min or so, yet my Windows system somehow managed to just snag 26 (will need a few more than that for it to re-fill it's cache. And while I was typing this, the Linux system picked up 53 (so I might last an hour now). It's amazing how often just posting about something often gets a result... Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.