Message boards :
Number crunching :
Panic Mode On (114) Server Problems?
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 45 · Next
Author | Message |
---|---|
betreger Send message Joined: 29 Jun 99 Posts: 11416 Credit: 29,581,041 RAC: 66 |
Stephen you should remember that Sten is from Sweden and could very well be suffering from lutefisk poisoning. That has been know to cause all sorts of irrational behavior. Ten Stinkiest Foods In the World |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
I'm seeing some good signs... assimilation and validation are happening. It also looks like some splitting is happening, but this is a big hole to dig out of. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Well, the splitters are running again, and at a good pace- we'll see how long they last this time. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Stephen you should remember that Sten is from Sweden and could very well be suffering from lutefisk poisoning. That has been know to cause all sorts of irrational behavior. He likes to go fishing- chuck out a lure & see what bites. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I'm seeing some good signs... assimilation and validation are happening. It also looks like some splitting is happening, but this is a big hole to dig out of. +1 Crossing my fingers. If the servers stay functional it is going to be a very bumpy recovery. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Well, my system that ran out of work is still out of work. The one that still had some work has been able to pick up 155 WUs since the project came back to life. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Full power to the kibble core. Onward, ye splitters, onward. Meeeeeeeeeeow!! "Time is simply the mechanism that keeps everything from happening all at once." |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The new TR seems to have picked up about 150 tasks or so. Nothing else on any of the other hosts. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Well, my system that ran out of work is still out of work. The system that has work, continues to pick up work every few requests. The one without any work, still not a thing. With any luck it looks like they've sorted out whatever was wrong since the last outage- The Splitters are splitting at a sustained good rate, work is going out, and the Validators are clearing their backlog and the Deleters are cleaning up after them, and the Replica is catching up at a much faster rate than it had been. *fingers crossed* Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Finally picked up some work on the empty system. Hopefully it'll be enough to keep more work coming. Edit- looks like it might be the case for most users that systems with work are getting more, before those with none get any. In progress has increased by over 400k, however Received-last-hour has barely moved from it's initial updated value when the servers came back of only 53k up to it's present 60k- it was over 140k before things fell over. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Well, my system that ran out of work is still out of work. . . And toes ... . . Finally got some work on the one machine still crunching. Time to fire the others back up. Stephen :) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Still hit or miss getting work, "Project has no tasks available" the usual response. But then I had a look at work In progress, and we've managed to get back to the level we're usually at after the weekly outage... It was a big hole to climb out of. Grant Darwin NT |
Stargate (SA) Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0 |
Thank you Stephen Stargate |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Still hit or miss getting work, "Project has no tasks available" the usual response. Well I have a full belly at the moment, and I have about est. 19 hrs. of M/W GPU work (72 tasks) to get rid of first. I suspended Seti until that's finished and so you guys and gals have fun picking up what's out there. When I crank Seti back up things should be almost back to normal. I don't buy computers, I build them!! |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
Panic is officially over. Replica is caught up. Assimilation and validation are happening in a timely manner and all caught up. Splitting is going well, and the RTS queue is building. I've been thinking (like an armchair quarterback) that it would be nice if they upped the RTS queue to be closer to a 6 hour reserve like it used to be. Now it is only good for 4 hours. I'm guessing that a bigger RTS queue could cause other problems either with the extended time on high load needed to refill it, or db space issues. I've also been thinking about the preference to hand out work to those machines that already have work, as someone noted. Could it be that the machines that had work after such a long outage, were asking for a smaller amount of work at a time and thus got service earlier? Maybe if the empty machines asked for a smaller amount of work to start off would they then get some sooner? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I've also been thinking about the preference to hand out work to those machines that already have work, as someone noted. Could it be that the machines that had work after such a long outage, were asking for a smaller amount of work at a time and thus got service earlier? Maybe if the empty machines asked for a smaller amount of work to start off would they then get some sooner? . . I think that is Richard's theory too... Stephen ? ? |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Yes, the servers have been doing a kick arse job at recovery. It could be because we cleaned out all the tasks so the server could start fresh. Actually, the slow machines that still have some tasks left are more likely to get work sooner that the fast computers. Slow computers will ask for more work after every completed WU, then again 5 minutes later, then a 1h backoff, then longer, and longer. Fast computers get stuck in that long backoff period since they have nothing left to report and reset the timer. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I've also been thinking about the preference to hand out work to those machines that already have work, as someone noted. Could it be that the machines that had work after such a long outage, were asking for a smaller amount of work at a time and thus got service earlier? Maybe if the empty machines asked for a smaller amount of work to start off would they then get some sooner?Remember that (under BOINC), the servers don't 'hand out' work: they respond to requests for work. I suspect that most times people report that the project isn't handing out work, what's really happening is that their client isn't asking for work. When BOINC asks for work, but isn't given any, it goes into a sulk ("why bother?"). If you still have old work, every completed task clears that sulk: but once you run dry, the sulk continues. A single project update after the outage (if it gets work) clears the logjam: but without that, BOINC can wait hours before it tries of its own accord. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I believe I played with the cache settings once back in the day to only ask for .25 day cache after a normal Tuesday outage instead of my normal 1 day cache. Didn't seem to make much difference in getting a successful work request or not. Still got the usual "no work is available" messages I normally do after an outage as everyone bellies up to the scheduler feeding trough. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
has anyone else noticed that it doesnt seem like credit is being awarded, or it's being awarded VERY slowly. we've been back up and running all day, but RAC numbers are still in nosedive. i mean, i see validation numbers going up, but credit totals arent. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.