Perfect Storm, Inc. (Jun 16 2010)

Message boards : Technical News : Perfect Storm, Inc. (Jun 16 2010)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1004783 - Posted: 16 Jun 2010, 20:02:05 UTC

Another day, another perfect storm.

We had our usual weekly outage yesterday (for database backups/maintenance/etc.) during which we take care of other hardware/project issues. Such as yesterday - we finally got our remote-controlled power strip configured and hoped to put on one of our crashy servers (ptolemy) on it.

This meant bringing ptolemy down, which pretty much kills *everything* including all the web sites/BOINC servers. We did so, only to find during the course of installationg the config on the power strip get reset somehow, so we had to fall back. All told, this meant an hour of delay/downtime, and we were once again at square one.

After that Dave and Jeff were coordinating getting some new scheduler fixes online, which required some database updates. So we didn't start the backup until after noon, which in turn meant the projects wouldn't be ready to come back on line until after well 5pm. Jeff manned that from home, but it turns out some poorly behaved yum upgrade of httpd on anakin in the meantime secretly broke the httpd config which was impossible to diagnose/fix at the time. So we were down for the night until we could figure it out in the morning.

I guess one silver lining being down all night meant Jeff and I had an opportunity to retry installing the power strip on ptolemy with minimal interruption (as we were already in the middle of a major interruption!). This time: success - as far as we can tell after one test, if ptolemy now crashes the power strip will detect this within 30 minutes and power cycle it. Hopefully this will vastly reduce our downtime when this happens again (usually on the weekends).

As I type this Jeff is still getting most of the BOINC back-end pieces working one by one, but at least we're doling out work for the moment as fast as we can.

I know most of you who read these updates know this already, but it bears repeating: nobody working directly on SETI@home (all 5 of us) works full time, and we all have enough other things going on that make it impossible for us to be "on call" in case of outage/emergencies. In my case, I currently have four regular separate sources of income with jobs/gigs in four completely different industries (covering all the bases in case one or more dry up). As for last night, when the httpd problems arose, I was working elsewhere, and when I checked in again around 10:30pm everyone else was asleep and I didn't want to start up the scheduler processes without others' input as they were still effectively on the operating table. We're pretty much given up any hope for 24/7 uptime, but BOINC takes care of that as long as you sign up for other projects.

On a more positive note: the "spike merge" is coming along, albeit slowly. May take one more whole week to complete. And we're still doing R&D regarding server shuffling to improve our science database throughput (and therefore speed up our candidate searching).

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1004783 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1004784 - Posted: 16 Jun 2010, 20:06:50 UTC - in response to Message 1004783.  
Last modified: 16 Jun 2010, 20:07:08 UTC

Thanks very much Matt. Believe it or not, a lot of us sympathize with you, and don't expect 24/7 uptime. Of course, we don't usually post about it.

ID: 1004784 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1004792 - Posted: 16 Jun 2010, 20:18:32 UTC

Matt,

Thanks for the update, it is greatly appreciated. For all of us, I wish to thank you and the rest of the team for the amount of effort you take to keep the project up and running. Regardless of the bellyaching that comes from some of us, you are doing ONE HELL OF A JOB.... KEEP UP THE GOOD WORK!!!


I don't buy computers, I build them!!
ID: 1004792 · Report as offensive
Profile Zeus Fab3r
Avatar

Send message
Joined: 17 Jan 01
Posts: 649
Credit: 275,335,635
RAC: 597
Serbia
Message 1004795 - Posted: 16 Jun 2010, 20:27:17 UTC

Thanks Matt, but would you please do somethig with quota thing for anonymous platforms (i.e. reset). I don't have fermi card, I don't trash work, I've received last MB unit 5 days ago and I'm still getting the same...

Message from server: (reached daily quota of 100 tasks)


Boki.

Who the hell is General Failure and why is he reading my harddisk?¿
ID: 1004795 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1004800 - Posted: 16 Jun 2010, 20:30:24 UTC - in response to Message 1004783.  
Last modified: 16 Jun 2010, 20:52:49 UTC

As always Matt, thanks for the update and for all the efforts from all the Staff,

Claggy

Edit: Seti Beta's project name is also geting reset from 'SETI@home Beta Test' to 'SETI@home' when Boinc attemts to update at Seti Beta,
then tells you're attached to Seti twice.
ID: 1004800 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1004814 - Posted: 16 Jun 2010, 20:46:56 UTC - in response to Message 1004783.  
Last modified: 16 Jun 2010, 21:02:05 UTC

Thanks for the hard work Matt, and others.

Found bug. Check out your SETI Project Preferences page. I'm getting:

Notice: Constant MAX_CPU_DESC already defined in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 104
Notice: Undefined property: stdClass::$background in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 416
Notice: Undefined property: stdClass::$user_logo in /disks/ptolemy/c/home/boincadm/projects/sah/html/seti_boinc_html/project_specific_prefs.inc on line 421

Edit: I assume Jeff is working on this minor thing, so I won't worry about it. Just letting you know.
ID: 1004814 · Report as offensive
Profile Allie in Vancouver
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 3949
Credit: 1,604,668
RAC: 0
Canada
Message 1004850 - Posted: 16 Jun 2010, 21:31:57 UTC

Don't stress about the people who expect 99.9% up-time. Most of us realize (and appreciate) that, considering your meager resources, you guys conjure miracles on a near-daily basis.
Pure mathematics is, in its way, the poetry of logical ideas.

Albert Einstein
ID: 1004850 · Report as offensive
woodenboatguy

Send message
Joined: 10 Nov 00
Posts: 368
Credit: 3,969,364
RAC: 0
Canada
Message 1004875 - Posted: 16 Jun 2010, 22:39:02 UTC

To paraphrase a great philosopher I greatly wish I could emulate:

"SETI abides."

Well done to everyone working the recovery.

Regards,

ID: 1004875 · Report as offensive
ront

Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 1004877 - Posted: 16 Jun 2010, 22:40:19 UTC

I add my thanks to the rest. Do appreciate all that you and your staff do.

ront
ID: 1004877 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1004879 - Posted: 16 Jun 2010, 22:41:48 UTC

I'm glad you guys don't work full time, you deserve a break as much as possible. So, I wouldn't expect you to work a minute over 80 hours a week! :-) Of course you know I'm joking though I'm sure it feels like you work that much some days.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1004879 · Report as offensive
Profile Sharpshooter

Send message
Joined: 26 Mar 00
Posts: 47
Credit: 5,127,861
RAC: 2
United States
Message 1004894 - Posted: 16 Jun 2010, 23:01:24 UTC
Last modified: 16 Jun 2010, 23:10:48 UTC

You folks that run SETI do the very difficult with precious little and occasionally the impossible with nothing. I'm sure I speak for the majority of us when I say we are thankful for all that you do. Thanks too for keeping us abreast of the technical difficulties despite the fact that some of it's lost on a lot of us (me included), but still it's nice to be included.
ID: 1004894 · Report as offensive
Profile Hellsheep
Volunteer tester

Send message
Joined: 12 Sep 08
Posts: 428
Credit: 784,780
RAC: 0
Australia
Message 1004906 - Posted: 16 Jun 2010, 23:18:08 UTC - in response to Message 1004900.  

Hi Matt,

Thanks for the update it's appreciated. Seti has never ever said it was a 24/7 project, that has simply been assumed by people who get cross when it isn't. I think you are quite right to point that out. The whole purpose of DC and Boinc is that you are SUPPOSED to sign up for multiple projects, because of computers being what they are.


Couldn't have put it better myself.

Although myself, i don't attach to other projects right now, but i don't exactly complain where there is no work either. :P

Back on topic;

Thanks Matt, you're all doing a great job. I was curious though, is it possible to get a few confirmations on whether the issues we're experiencing with quotas and invalid app_info error messages are related to some sort of updates that need a bit of tweaking? I think it would be good information as people might feel at peace with themselves after knowing why it's happening. :P
- Jarryd
ID: 1004906 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1005247 - Posted: 17 Jun 2010, 13:28:30 UTC

Umm, somewhere in the process, SETI beta lost it's individuality! Anytime the BOINC client contacts S@H Beta, it immediately switches to calling itself SETI (not Beta) and then you have the problem of two projects with one name. I lost at least a beta AP WU and probably a couple of MB Beta wu's to this... (and this from the only computer that has reported to Beta since yesterday...)

Have tried detach/attaching to Beta, but it keeps doing this, so I'm staying off Beta for now.
.

Hello, from Albany, CA!...
ID: 1005247 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65689
Credit: 55,293,173
RAC: 49
United States
Message 1005844 - Posted: 18 Jun 2010, 19:34:18 UTC

So far It looks like Seti@Home is just making Ghosts for the Anonymous Platform, I've supposedly got 300 WU's, Problem is I've never seen them downloaded and no one official has even acknowledged what is going on, The Quota system is a failure, bring back the old system. Cause this one sucks as It seems to only work with the stock app and I refuse to use that "thing".
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1005844 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1006583 - Posted: 20 Jun 2010, 14:38:06 UTC

... has anyone noticed that the validators are off-line? - NTM that the assimilators seem to be stuck? The assimilator queue is at 887,533 - the same as it was roughly 24 hours ago!
.

Hello, from Albany, CA!...
ID: 1006583 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1006618 - Posted: 20 Jun 2010, 17:05:33 UTC - in response to Message 1006583.  

... has anyone noticed that the validators are off-line? - NTM that the assimilators seem to be stuck? The assimilator queue is at 887,533 - the same as it was roughly 24 hours ago!


Yes, there are several threads running over in Number Crunching that address this issue.

Given the number of regular posters who have commented, and who also have Matt and Eric's private email addresses, I would be real surprised if they don't know about it.

With no notes here or on the Home Page, we can only speculate what the problem is, what has already been tried, or is in the works for Monday to solve the problem.

And as we all know, speculation based on no real information leads only to panic, anger, and frustration.

So I'm not gonna speculate. They will be in on Monday, and I have enough work to get through Tuesday.
ID: 1006618 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1006623 - Posted: 20 Jun 2010, 17:18:42 UTC - in response to Message 1006618.  


And as we all know, speculation based on no real information leads only to panic, anger, and frustration.



And the reason we have to speculate, rather than know, is because of the utter lack of concern for the users and professionalism by the staff.
ID: 1006623 · Report as offensive
Profile Jim Welch Project Donor

Send message
Joined: 17 May 99
Posts: 21
Credit: 17,033,977
RAC: 47
United States
Message 1006639 - Posted: 20 Jun 2010, 18:31:20 UTC - in response to Message 1006623.  

The "staff" do not exist to serve and please you, jravin. We all made a conscious choice to participate in this experiement and we are certainly free to come and go as we like.
ID: 1006639 · Report as offensive
Profile Todd Hebert
Volunteer tester
Avatar

Send message
Joined: 16 Jun 00
Posts: 648
Credit: 228,292,957
RAC: 0
United States
Message 1006657 - Posted: 20 Jun 2010, 20:01:27 UTC

People should consider the scope that this project encompasses and the very limited resources at hand before sending harsh comments. Yes I am defending the staff of this project! The sheer number of hosts and active users is incredible - I couldn't imagine supporting almost 2 million users with 5.

Thankfully BOINC can adapt and will recover in time but users with narrow perspectives should consider the bigger picture and what they really do offer to the scientific community by throwing barbs of contempt.

This is my opinion only - but I do believe that others would support this as well.

Todd
ID: 1006657 · Report as offensive
Blackie

Send message
Joined: 18 May 09
Posts: 1
Credit: 926,903
RAC: 0
Australia
Message 1006677 - Posted: 20 Jun 2010, 21:30:38 UTC

Totally agree. We volunteered our computers to SETI@Home. Considering what the "staff" have to deal with, and the lack of resources they contend with everyday - I think they deserve our praise in keeping SETI running as well as they do rather than labelling them 'unprofessional' and 'not caring about us'.

ID: 1006677 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Technical News : Perfect Storm, Inc. (Jun 16 2010)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.