Panic Mode On (114) Server Problems?

Author	Message
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1969424 - Posted: 9 Dec 2018, 1:47:02 UTC - in response to Message 1969415. Stephen you should remember that Sten is from Sweden and could very well be suffering from lutefisk poisoning. That has been know to cause all sorts of irrational behavior. Ten Stinkiest Foods In the World Iru. ... More from Delish: The Weirdest Restaurants Around the World. Doenjang. ... Lutefisk. ... Stinky Tofu (chÃ²u dÃ²ufu) ... Vieux Boulogne. ... SurstrÃ¶mming. ... Durian. This southeast Asian fruit, often used in smoothies or as the stuffing in sweet buns, is revered by some for its ripe, nutty, pungent flavor. ID: 1969424 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1969432 - Posted: 9 Dec 2018, 2:34:20 UTC I'm seeing some good signs... assimilation and validation are happening. It also looks like some splitting is happening, but this is a big hole to dig out of. ID: 1969432 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1969433 - Posted: 9 Dec 2018, 2:35:06 UTC Well, the splitters are running again, and at a good pace- we'll see how long they last this time. Grant Darwin NT ID: 1969433 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1969434 - Posted: 9 Dec 2018, 2:35:41 UTC - in response to Message 1969424. Stephen you should remember that Sten is from Sweden and could very well be suffering from lutefisk poisoning. That has been know to cause all sorts of irrational behavior. He likes to go fishing- chuck out a lure & see what bites. Grant Darwin NT ID: 1969434 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1969435 - Posted: 9 Dec 2018, 2:36:40 UTC - in response to Message 1969432. I'm seeing some good signs... assimilation and validation are happening. It also looks like some splitting is happening, but this is a big hole to dig out of. +1 Crossing my fingers. If the servers stay functional it is going to be a very bumpy recovery. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1969435 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1969437 - Posted: 9 Dec 2018, 3:02:57 UTC Well, my system that ran out of work is still out of work. The one that still had some work has been able to pick up 155 WUs since the project came back to life. Grant Darwin NT ID: 1969437 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1969440 - Posted: 9 Dec 2018, 3:58:37 UTC Full power to the kibble core. Onward, ye splitters, onward. Meeeeeeeeeeow!! "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1969440 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1969442 - Posted: 9 Dec 2018, 4:17:30 UTC - in response to Message 1969437. The new TR seems to have picked up about 150 tasks or so. Nothing else on any of the other hosts. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1969442 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1969444 - Posted: 9 Dec 2018, 4:27:18 UTC - in response to Message 1969437. Last modified: 9 Dec 2018, 4:31:07 UTC Well, my system that ran out of work is still out of work. The one that still had some work has been able to pick up 155 WUs since the project came back to life. The system that has work, continues to pick up work every few requests. The one without any work, still not a thing. With any luck it looks like they've sorted out whatever was wrong since the last outage- The Splitters are splitting at a sustained good rate, work is going out, and the Validators are clearing their backlog and the Deleters are cleaning up after them, and the Replica is catching up at a much faster rate than it had been. fingers crossed Grant Darwin NT ID: 1969444 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1969446 - Posted: 9 Dec 2018, 4:43:43 UTC Last modified: 9 Dec 2018, 4:50:03 UTC Finally picked up some work on the empty system. Hopefully it'll be enough to keep more work coming. Edit- looks like it might be the case for most users that systems with work are getting more, before those with none get any. In progress has increased by over 400k, however Received-last-hour has barely moved from it's initial updated value when the servers came back of only 53k up to it's present 60k- it was over 140k before things fell over. Grant Darwin NT ID: 1969446 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1969456 - Posted: 9 Dec 2018, 5:47:07 UTC - in response to Message 1969444. Well, my system that ran out of work is still out of work. The one that still had some work has been able to pick up 155 WUs since the project came back to life. The system that has work, continues to pick up work every few requests. The one without any work, still not a thing. With any luck it looks like they've sorted out whatever was wrong since the last outage- The Splitters are splitting at a sustained good rate, work is going out, and the Validators are clearing their backlog and the Deleters are cleaning up after them, and the Replica is catching up at a much faster rate than it had been. fingers crossed . . And toes ... . . Finally got some work on the one machine still crunching. Time to fire the others back up. Stephen :) ID: 1969456 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1969467 - Posted: 9 Dec 2018, 9:24:50 UTC Still hit or miss getting work, "Project has no tasks available" the usual response. But then I had a look at work In progress, and we've managed to get back to the level we're usually at after the weekly outage... It was a big hole to climb out of. Grant Darwin NT ID: 1969467 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1969473 - Posted: 9 Dec 2018, 12:22:39 UTC - in response to Message 1969415. Thank you Stephen Stargate ID: 1969473 ·

Cliff Harding Volunteer tester Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67	Message 1969476 - Posted: 9 Dec 2018, 12:51:34 UTC - in response to Message 1969467. Still hit or miss getting work, "Project has no tasks available" the usual response. But then I had a look at work In progress, and we've managed to get back to the level we're usually at after the weekly outage... It was a big hole to climb out of. Well I have a full belly at the moment, and I have about est. 19 hrs. of M/W GPU work (72 tasks) to get rid of first. I suspended Seti until that's finished and so you guys and gals have fun picking up what's out there. When I crank Seti back up things should be almost back to normal. I don't buy computers, I build them!! ID: 1969476 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1969628 - Posted: 9 Dec 2018, 20:35:31 UTC Panic is officially over. Replica is caught up. Assimilation and validation are happening in a timely manner and all caught up. Splitting is going well, and the RTS queue is building. I've been thinking (like an armchair quarterback) that it would be nice if they upped the RTS queue to be closer to a 6 hour reserve like it used to be. Now it is only good for 4 hours. I'm guessing that a bigger RTS queue could cause other problems either with the extended time on high load needed to refill it, or db space issues. I've also been thinking about the preference to hand out work to those machines that already have work, as someone noted. Could it be that the machines that had work after such a long outage, were asking for a smaller amount of work at a time and thus got service earlier? Maybe if the empty machines asked for a smaller amount of work to start off would they then get some sooner? ID: 1969628 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1969648 - Posted: 9 Dec 2018, 21:28:59 UTC - in response to Message 1969628. I've also been thinking about the preference to hand out work to those machines that already have work, as someone noted. Could it be that the machines that had work after such a long outage, were asking for a smaller amount of work at a time and thus got service earlier? Maybe if the empty machines asked for a smaller amount of work to start off would they then get some sooner? . . I think that is Richard's theory too... Stephen ? ? ID: 1969648 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1969651 - Posted: 9 Dec 2018, 21:31:48 UTC - in response to Message 1969628. Yes, the servers have been doing a kick arse job at recovery. It could be because we cleaned out all the tasks so the server could start fresh. Actually, the slow machines that still have some tasks left are more likely to get work sooner that the fast computers. Slow computers will ask for more work after every completed WU, then again 5 minutes later, then a 1h backoff, then longer, and longer. Fast computers get stuck in that long backoff period since they have nothing left to report and reset the timer. ID: 1969651 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1969652 - Posted: 9 Dec 2018, 21:37:19 UTC - in response to Message 1969628. I've also been thinking about the preference to hand out work to those machines that already have work, as someone noted. Could it be that the machines that had work after such a long outage, were asking for a smaller amount of work at a time and thus got service earlier? Maybe if the empty machines asked for a smaller amount of work to start off would they then get some sooner? Remember that (under BOINC), the servers don't 'hand out' work: they respond to *requests* for work. I suspect that most times people report that the project isn't handing out work, what's really happening is that their client isn't asking for work. When BOINC asks for work, but isn't given any, it goes into a sulk ("why bother?"). If you still have old work, every completed task clears that sulk: but once you run dry, the sulk continues. A single project update after the outage (if it gets work) clears the logjam: but without that, BOINC can wait hours before it tries of its own accord. ID: 1969652 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1969663 - Posted: 9 Dec 2018, 22:59:38 UTC - in response to Message 1969652. I believe I played with the cache settings once back in the day to only ask for .25 day cache after a normal Tuesday outage instead of my normal 1 day cache. Didn't seem to make much difference in getting a successful work request or not. Still got the usual "no work is available" messages I normally do after an outage as everyone bellies up to the scheduler feeding trough. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1969663 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1969695 - Posted: 10 Dec 2018, 1:59:28 UTC Last modified: 10 Dec 2018, 1:59:59 UTC has anyone else noticed that it doesnt seem like credit is being awarded, or it's being awarded VERY slowly. we've been back up and running all day, but RAC numbers are still in nosedive. i mean, i see validation numbers going up, but credit totals arent. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1969695 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.