Message boards :
Technical News :
Perfect Storm, Inc. (Jun 16 2010)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Another day, another perfect storm. We had our usual weekly outage yesterday (for database backups/maintenance/etc.) during which we take care of other hardware/project issues. Such as yesterday - we finally got our remote-controlled power strip configured and hoped to put on one of our crashy servers (ptolemy) on it. This meant bringing ptolemy down, which pretty much kills *everything* including all the web sites/BOINC servers. We did so, only to find during the course of installationg the config on the power strip get reset somehow, so we had to fall back. All told, this meant an hour of delay/downtime, and we were once again at square one. After that Dave and Jeff were coordinating getting some new scheduler fixes online, which required some database updates. So we didn't start the backup until after noon, which in turn meant the projects wouldn't be ready to come back on line until after well 5pm. Jeff manned that from home, but it turns out some poorly behaved yum upgrade of httpd on anakin in the meantime secretly broke the httpd config which was impossible to diagnose/fix at the time. So we were down for the night until we could figure it out in the morning. I guess one silver lining being down all night meant Jeff and I had an opportunity to retry installing the power strip on ptolemy with minimal interruption (as we were already in the middle of a major interruption!). This time: success - as far as we can tell after one test, if ptolemy now crashes the power strip will detect this within 30 minutes and power cycle it. Hopefully this will vastly reduce our downtime when this happens again (usually on the weekends). As I type this Jeff is still getting most of the BOINC back-end pieces working one by one, but at least we're doling out work for the moment as fast as we can. I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects. On a more positive note: the "spike merge" is coming along, albeit slowly. May take one more whole week to complete. And we're still doing R&D regarding server shuffling to improve our science database throughput (and therefore speed up our candidate searching). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 |
Thanks very much Matt. Believe it or not, a lot of us sympathize with you, and don't expect 24/7 uptime. Of course, we don't usually post about it. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Matt, Thanks for the update, it is greatly appreciated. For all of us, I wish to thank you and the rest of the team for the amount of effort you take to keep the project up and running. Regardless of the bellyaching that comes from some of us, you are doing ONE HELL OF A JOB.... KEEP UP THE GOOD WORK!!! I don't buy computers, I build them!! |
Zeus Fab3r Send message Joined: 17 Jan 01 Posts: 649 Credit: 275,335,635 RAC: 597 |
Thanks Matt, but would you please do somethig with quota thing for anonymous platforms (i.e. reset). I don't have fermi card, I don't trash work, I've received last MB unit 5 days ago and I'm still getting the same... Message from server: (reached daily quota of 100 tasks) Boki. Who the hell is General Failure and why is he reading my harddisk?¿ |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
As always Matt, thanks for the update and for all the efforts from all the Staff, Claggy Edit: Seti Beta's project name is also geting reset from 'SETI@home Beta Test' to 'SETI@home' when Boinc attemts to update at Seti Beta, then tells you're attached to Seti twice. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Thanks for the hard work Matt, and others. Found bug. Check out your SETI Project Preferences page. I'm getting: Notice: Constant MAX_CPU_DESC already defined in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 104 Notice: Undefined property: stdClass::$background in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 416 Notice: Undefined property: stdClass::$user_logo in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 421 Edit: I assume Jeff is working on this minor thing, so I won't worry about it. Just letting you know. |
Allie in Vancouver Send message Joined: 16 Mar 07 Posts: 3949 Credit: 1,604,668 RAC: 0 |
Don't stress about the people who expect 99.9% up-time. Most of us realize (and appreciate) that, considering your meager resources, you guys conjure miracles on a near-daily basis. Pure mathematics is, in its way, the poetry of logical ideas. Albert Einstein |
woodenboatguy Send message Joined: 10 Nov 00 Posts: 368 Credit: 3,969,364 RAC: 0 |
To paraphrase a great philosopher I greatly wish I could emulate: "SETI abides." Well done to everyone working the recovery. Regards, |
ront Send message Joined: 25 Aug 01 Posts: 77 Credit: 386,336 RAC: 0 |
I add my thanks to the rest. Do appreciate all that you and your staff do. ront |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
I'm glad you guys don't work full time, you deserve a break as much as possible. So, I wouldn't expect you to work a minute over 80 hours a week! :-) Of course you know I'm joking though I'm sure it feels like you work that much some days. PROUD MEMBER OF Team Starfire World BOINC |
Sharpshooter Send message Joined: 26 Mar 00 Posts: 47 Credit: 5,127,861 RAC: 2 |
You folks that run SETI do the very difficult with precious little and occasionally the impossible with nothing. I'm sure I speak for the majority of us when I say we are thankful for all that you do. Thanks too for keeping us abreast of the technical difficulties despite the fact that some of it's lost on a lot of us (me included), but still it's nice to be included. |
Hellsheep Send message Joined: 12 Sep 08 Posts: 428 Credit: 784,780 RAC: 0 |
Hi Matt, Couldn't have put it better myself. Although myself, i don't attach to other projects right now, but i don't exactly complain where there is no work either. :P Back on topic; Thanks Matt, you're all doing a great job. I was curious though, is it possible to get a few confirmations on whether the issues we're experiencing with quotas and invalid app_info error messages are related to some sort of updates that need a bit of tweaking? I think it would be good information as people might feel at peace with themselves after knowing why it's happening. :P - Jarryd |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Umm, somewhere in the process, SETI beta lost it's individuality! Anytime the BOINC client contacts S@H Beta, it immediately switches to calling itself SETI (not Beta) and then you have the problem of two projects with one name. I lost at least a beta AP WU and probably a couple of MB Beta wu's to this... (and this from the only computer that has reported to Beta since yesterday...) Have tried detach/attaching to Beta, but it keeps doing this, so I'm staying off Beta for now. . Hello, from Albany, CA!... |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66338 Credit: 55,293,173 RAC: 49 |
So far It looks like Seti@Home is just making Ghosts for the Anonymous Platform, I've supposedly got 300 WU's, Problem is I've never seen them downloaded and no one official has even acknowledged what is going on, The Quota system is a failure, bring back the old system. Cause this one sucks as It seems to only work with the stock app and I refuse to use that "thing". Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
... has anyone noticed that the validators are off-line? - NTM that the assimilators seem to be stuck? The assimilator queue is at 887,533 - the same as it was roughly 24 hours ago! . Hello, from Albany, CA!... |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
... has anyone noticed that the validators are off-line? - NTM that the assimilators seem to be stuck? The assimilator queue is at 887,533 - the same as it was roughly 24 hours ago! Yes, there are several threads running over in Number Crunching that address this issue. Given the number of regular posters who have commented, and who also have Matt and Eric's private email addresses, I would be real surprised if they don't know about it. With no notes here or on the Home Page, we can only speculate what the problem is, what has already been tried, or is in the works for Monday to solve the problem. And as we all know, speculation based on no real information leads only to panic, anger, and frustration. So I'm not gonna speculate. They will be in on Monday, and I have enough work to get through Tuesday. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
And the reason we have to speculate, rather than know, is because of the utter lack of concern for the users and professionalism by the staff. |
Jim Welch Send message Joined: 17 May 99 Posts: 21 Credit: 17,033,977 RAC: 47 |
The "staff" do not exist to serve and please you, jravin. We all made a conscious choice to participate in this experiement and we are certainly free to come and go as we like. |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
People should consider the scope that this project encompasses and the very limited resources at hand before sending harsh comments. Yes I am defending the staff of this project! The sheer number of hosts and active users is incredible - I couldn't imagine supporting almost 2 million users with 5. Thankfully BOINC can adapt and will recover in time but users with narrow perspectives should consider the bigger picture and what they really do offer to the scientific community by throwing barbs of contempt. This is my opinion only - but I do believe that others would support this as well. Todd |
Blackie Send message Joined: 18 May 09 Posts: 1 Credit: 926,903 RAC: 0 |
Totally agree. We volunteered our computers to SETI@Home. Considering what the "staff" have to deal with, and the lack of resources they contend with everyday - I think they deserve our praise in keeping SETI running as well as they do rather than labelling them 'unprofessional' and 'not caring about us'. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.