| Author |
Message |
Matt LebofskyVolunteer moderator Project administrator Project developer Project scientist
 Send message
Joined: 1 Mar 99 Posts: 1375 Credit: 74,079 RAC: 0

|
|
Another day, another perfect storm.
We had our usual weekly outage yesterday (for database backups/maintenance/etc.) during which we take care of other hardware/project issues. Such as yesterday - we finally got our remote-controlled power strip configured and hoped to put on one of our crashy servers (ptolemy) on it.
This meant bringing ptolemy down, which pretty much kills *everything* including all the web sites/BOINC servers. We did so, only to find during the course of installationg the config on the power strip get reset somehow, so we had to fall back. All told, this meant an hour of delay/downtime, and we were once again at square one.
After that Dave and Jeff were coordinating getting some new scheduler fixes online, which required some database updates. So we didn't start the backup until after noon, which in turn meant the projects wouldn't be ready to come back on line until after well 5pm. Jeff manned that from home, but it turns out some poorly behaved yum upgrade of httpd on anakin in the meantime secretly broke the httpd config which was impossible to diagnose/fix at the time. So we were down for the night until we could figure it out in the morning.
I guess one silver lining being down all night meant Jeff and I had an opportunity to retry installing the power strip on ptolemy with minimal interruption (as we were already in the middle of a major interruption!). This time: success - as far as we can tell after one test, if ptolemy now crashes the power strip will detect this within 30 minutes and power cycle it. Hopefully this will vastly reduce our downtime when this happens again (usually on the weekends).
As I type this Jeff is still getting most of the BOINC back-end pieces working one by one, but at least we're doling out work for the moment as fast as we can.
I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects.
On a more positive note: the "spike merge" is coming along, albeit slowly. May take one more whole week to complete. And we're still doing R&D regarding server shuffling to improve our science database throughput (and therefore speed up our candidate searching).
- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
|
|
|
|
|
Thanks very much Matt. Believe it or not, a lot of us sympathize with you, and don't expect 24/7 uptime. Of course, we don't usually post about it.
____________
|
|
|
|
|
|
Matt,
Thanks for the update, it is greatly appreciated. For all of us, I wish to thank you and the rest of the team for the amount of effort you take to keep the project up and running. Regardless of the bellyaching that comes from some of us, you are doing ONE HELL OF A JOB.... KEEP UP THE GOOD WORK!!!
____________
I don't buy computers, I build them!! |
|
|
|
|
|
Thanks Matt, but would you please do somethig with quota thing for anonymous platforms (i.e. reset). I don't have fermi card, I don't trash work, I've received last MB unit 5 days ago and I'm still getting the same...
Message from server: (reached daily quota of 100 tasks)
Boki.
____________
Who the hell is General Failure and why is he reading my harddisk?¿ |
|
|
Claggy Volunteer tester Send message
Joined: 5 Jul 99 Posts: 3368 Credit: 25,956,297 RAC: 1,484

|
|
As always Matt, thanks for the update and for all the efforts from all the Staff,
Claggy
Edit: Seti Beta's project name is also geting reset from 'SETI@home Beta Test' to 'SETI@home' when Boinc attemts to update at Seti Beta,
then tells you're attached to Seti twice. |
|
|
|
|
|
Thanks for the hard work Matt, and others.
Found bug. Check out your SETI Project Preferences page. I'm getting:
Notice: Constant MAX_CPU_DESC already defined in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 104
Notice: Undefined property: stdClass::$background in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 416
Notice: Undefined property: stdClass::$user_logo in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 421
Edit: I assume Jeff is working on this minor thing, so I won't worry about it. Just letting you know. |
|
|
|
|
|
Don't stress about the people who expect 99.9% up-time. Most of us realize (and appreciate) that, considering your meager resources, you guys conjure miracles on a near-daily basis.
____________
|
|
|
|
|
|
To paraphrase a great philosopher I greatly wish I could emulate:
"SETI abides."
Well done to everyone working the recovery.
Regards,
____________
|
|
|
|
|
|
I add my thanks to the rest. Do appreciate all that you and your staff do.
ront
____________
|
|
|
|
|
|
I'm glad you guys don't work full time, you deserve a break as much as possible. So, I wouldn't expect you to work a minute over 80 hours a week! :-) Of course you know I'm joking though I'm sure it feels like you work that much some days.
____________
PROUD MEMBER OF Team Starfire World BOINC |
|
|
|
|
|
You folks that run SETI do the very difficult with precious little and occasionally the impossible with nothing. I'm sure I speak for the majority of us when I say we are thankful for all that you do. Thanks too for keeping us abreast of the technical difficulties despite the fact that some of it's lost on a lot of us (me included), but still it's nice to be included.
____________
|
|
|
|
|
|
Hi Matt,
Thanks for the update it's appreciated. Seti has never ever said it was a 24/7 project, that has simply been assumed by people who get cross when it isn't. I think you are quite right to point that out. The whole purpose of DC and Boinc is that you are SUPPOSED to sign up for multiple projects, because of computers being what they are.
____________
Damsel Rescuer, Kitty Patron, Raccoon Friend, Uli Fan,
Julie Supporter, ES99 Admirer, PETA Member, 1st Childhood
|
|
|
|
|
Hi Matt,
Thanks for the update it's appreciated. Seti has never ever said it was a 24/7 project, that has simply been assumed by people who get cross when it isn't. I think you are quite right to point that out. The whole purpose of DC and Boinc is that you are SUPPOSED to sign up for multiple projects, because of computers being what they are.
Couldn't have put it better myself.
Although myself, i don't attach to other projects right now, but i don't exactly complain where there is no work either. :P
Back on topic;
Thanks Matt, you're all doing a great job. I was curious though, is it possible to get a few confirmations on whether the issues we're experiencing with quotas and invalid app_info error messages are related to some sort of updates that need a bit of tweaking? I think it would be good information as people might feel at peace with themselves after knowing why it's happening. :P
____________
- Jarryd |
|
|
|
|
|
Umm, somewhere in the process, SETI beta lost it's individuality! Anytime the BOINC client contacts S@H Beta, it immediately switches to calling itself SETI (not Beta) and then you have the problem of two projects with one name. I lost at least a beta AP WU and probably a couple of MB Beta wu's to this... (and this from the only computer that has reported to Beta since yesterday...)
Have tried detach/attaching to Beta, but it keeps doing this, so I'm staying off Beta for now.
____________
.
|
|
|
|
|
|
So far It looks like Seti@Home is just making Ghosts for the Anonymous Platform, I've supposedly got 300 WU's, Problem is I've never seen them downloaded and no one official has even acknowledged what is going on, The Quota system is a failure, bring back the old system. Cause this one sucks as It seems to only work with the stock app and I refuse to use that "thing".
____________
BSG Anthem
My Facebook page
|
|
|
|
|
|
... has anyone noticed that the validators are off-line? - NTM that the assimilators seem to be stuck? The assimilator queue is at 887,533 - the same as it was roughly 24 hours ago!
____________
.
|
|
|
|
|
... has anyone noticed that the validators are off-line? - NTM that the assimilators seem to be stuck? The assimilator queue is at 887,533 - the same as it was roughly 24 hours ago!
Yes, there are several threads running over in Number Crunching that address this issue.
Given the number of regular posters who have commented, and who also have Matt and Eric's private email addresses, I would be real surprised if they don't know about it.
With no notes here or on the Home Page, we can only speculate what the problem is, what has already been tried, or is in the works for Monday to solve the problem.
And as we all know, speculation based on no real information leads only to panic, anger, and frustration.
So I'm not gonna speculate. They will be in on Monday, and I have enough work to get through Tuesday. |
|
|
|
|
And as we all know, speculation based on no real information leads only to panic, anger, and frustration.
And the reason we have to speculate, rather than know, is because of the utter lack of concern for the users and professionalism by the staff.
____________
|
|
|
|
|
|
The "staff" do not exist to serve and please you, jravin. We all made a conscious choice to participate in this experiement and we are certainly free to come and go as we like.
____________
|
|
|
|
|
|
People should consider the scope that this project encompasses and the very limited resources at hand before sending harsh comments. Yes I am defending the staff of this project! The sheer number of hosts and active users is incredible - I couldn't imagine supporting almost 2 million users with 5.
Thankfully BOINC can adapt and will recover in time but users with narrow perspectives should consider the bigger picture and what they really do offer to the scientific community by throwing barbs of contempt.
This is my opinion only - but I do believe that others would support this as well.
Todd
____________
|
|
|