Composite Head (Nov 05 2008)


log in

Advanced search

Message boards : Technical News : Composite Head (Nov 05 2008)

1 · 2 · 3 · 4 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 827334 - Posted: 5 Nov 2008, 21:32:17 UTC

At 7:30pm last night the scheduler apache server got hung up - probably from all the election night excitement. These apache servers need a kick fairly often. Unfortunately they die various way due to various things, so automating the checking of certain pulses doesn't always help - in fact such things usually make systems more complicated and unpredictable. In the case last night it failed during log rotation which issues an "httpd restart" - this time the head-in process didn't die, so port 80 got locked up. I had to log in and kill the zombie httpds by hand before restarting apache. Not a big deal, though it got missed for a couple hours as it was timed perfectly with the entire country busy watching the news.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 827347 - Posted: 5 Nov 2008, 22:04:40 UTC
Last modified: 5 Nov 2008, 22:04:53 UTC

Thanks for the update, Matt. I can't see too many complaints on the Boards from last night, so I guess most of the world was watching the News (those that weren't sleeping, that is).

F.
____________

Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7520
Credit: 6,702,611
RAC: 6,993
United States
Message 827361 - Posted: 5 Nov 2008, 22:59:09 UTC

Matt--thanks as always....
____________


Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2245
Credit: 8,596,056
RAC: 4,332
United States
Message 827362 - Posted: 5 Nov 2008, 23:02:42 UTC

I had about 30 results that kept trying to report and failed, but the time span from noticing the problem to the problem being resolved was less than an hour. Didn't really bother me all that much.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,055,376
RAC: 4,577
United States
Message 827367 - Posted: 5 Nov 2008, 23:32:16 UTC - in response to Message 827362.

The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project.



I had about 30 results that kept trying to report and failed, but the time span from noticing the problem to the problem being resolved was less than an hour. Didn't really bother me all that much.


____________

Profile Dr. C.E.T.I.
Avatar
Send message
Joined: 29 Feb 00
Posts: 15993
Credit: 690,597
RAC: 0
United States
Message 827368 - Posted: 5 Nov 2008, 23:36:10 UTC


. . . nice work Berkeley - and, Thanks for the Update Matt

____________
BOINC Wiki . . .

Science Status Page . . .

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 827374 - Posted: 6 Nov 2008, 0:18:17 UTC - in response to Message 827367.

The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project.

Really depends on how you define BOINC.

An important goal was to produce something that'd be able to do big computing projects on no budget. Certainly, SETI@Home is the "flagship" project for no-budget big-computing.

____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,055,376
RAC: 4,577
United States
Message 827383 - Posted: 6 Nov 2008, 0:42:09 UTC - in response to Message 827374.

My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. But the point is, SETI has been for quite a while more than a bit 'twitchy' -- in a lot of ways -- and with the large active user count the 'twitchiness' gets noticed quickly and tends to blossom into various troublesome outage issues.

In the BOINC world, that isn't so bad, one simply must have multiple projects running. I certainly do. It is just that perhaps SETI as the flagship of the BOINC projects seems to have as an operational goal enough 'twitchiness' to insure that folks do the right BOINC thing and attach to multiple projects. In that regard it has been particularly admirable over the past months.


The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project.

Really depends on how you define BOINC.

An important goal was to produce something that'd be able to do big computing projects on no budget. Certainly, SETI@Home is the "flagship" project for no-budget big-computing.


____________

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,105,569
RAC: 3,934
United States
Message 827384 - Posted: 6 Nov 2008, 0:44:39 UTC - in response to Message 827374.

Ned, thank you for always pointing out the silver lining in our cloud worry, so to speak.

Of course, its possible that we are becoming the "flagship" project for no-budget big-computing, with the most error prone network communications and no scientific results. But that would probably tarnish your silver.

So I guess we should just hum the "don't worry; be happy" jingle along with Bobby McFarrin. We really don't need to get any better, after all.

Profile JDWhale
Volunteer tester
Avatar
Send message
Joined: 6 Apr 99
Posts: 921
Credit: 19,911,040
RAC: 1,967
United States
Message 827385 - Posted: 6 Nov 2008, 0:47:27 UTC

I figured, since the network news bungled the past couple presidential elections... I'd watch "Get Smart" DVD and time shifted "Boston Legal" & "30 Rock" rather than the election coverage last night..."Denny Crane"

Oh... were there SETI servers issues last night? Sorry, I didn't notice..
____________

Profile CAHEN
Send message
Joined: 23 May 01
Posts: 3
Credit: 1,272,437
RAC: 0
Message 827528 - Posted: 6 Nov 2008, 9:22:25 UTC - in response to Message 827334.

These apache servers need a kick fairly often.



Well, a simple cron job like full reboot and voila, the kick is done.

This is simply supposing to have a good start-up scripts.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24375
Credit: 519,750
RAC: 37
United States
Message 827542 - Posted: 6 Nov 2008, 12:01:26 UTC - in response to Message 827528.

These apache servers need a kick fairly often.



Well, a simple cron job like full reboot and voila, the kick is done.

This is simply supposing to have a good start-up scripts.

If I recall some earlier posts about starting things up. There are several machines that have to e started in the right order. I don't know if one can be re-started after it goes down without bringing some of the other computers down for a while,
____________


BOINC WIKI

Rudy
Volunteer tester
Send message
Joined: 23 Jun 99
Posts: 189
Credit: 565,196
RAC: 51
Canada
Message 827557 - Posted: 6 Nov 2008, 13:37:50 UTC - in response to Message 827383.

My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. /deleted stuff/


I completely disagree with you comments about scalability/reliability, it seems to me to only be a question of resources for Seti@home.

The second largest project Einstein@home easily has more staff, host computers, and better funding than Seti. Distributed across 5 sites worldwide, funded by government grants in the US and Germany. Currently on their home page they have job postings out for 4 additional staff. This translates into super stability. They currently have been up continuously for 92 days.

Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded.

Profile QuietDad
Avatar
Send message
Joined: 2 Oct 99
Posts: 79
Credit: 4,777,081
RAC: 4,696
United States
Message 827567 - Posted: 6 Nov 2008, 14:25:06 UTC

After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation.

Seems hundreds of thousands of us realize we donate our spare computer cycles for the folks at the projects we're involved with to do with as they wish. If there are issues with the project, so be it, they will resolve. There is the Other group, the "10 percenters" that think The Seti group and especially Matt are employed to squeeze the most credit out off personal machines and should be available 24/7 to make it happen.

I Understand this is THEIR project, not mine and appreciate the fact they do what they do so I can dream of being the guy that discovers E.T. so he can find us and destroy us (just kidding).

Matt and the Seti Staff, you do great work and should be applauded. The rest of you, relax.
____________

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 544,758
RAC: 267
United States
Message 827590 - Posted: 6 Nov 2008, 16:39:41 UTC - in response to Message 827557.

My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. /deleted stuff/


Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded.


I agree that SETI's problems are not intentional. I thought the vision of BOINC was to provide an open-source, grid computing project with minimal staff and maintenance. In that sense, I think BOINC falls a little short.

When developing a new feature or program in the business world, we should always consider the administrative or maintenance cost of the project. For example, adding a feature that requires a nightly database update is generally a bad idea; a method to update the data more efficiently is preferred.

In the same vein, a grid computing project should always keep in mind "scale" meaning, "How big can this thing get?" The user base has clearly pushed SETI's servers to the max. Software bugs and limited staffing resources only make it worse. What is needed is to observe and analyze WHY projects such as Einstein, ClimatePrediction, and others have such good uptime. How do they divide the various BOINC processes on servers? Is it constant staff intervention? Is it co-location? Do any of them use virtualization?

Maybe the SETI team could visit UWM and see how they run things. This could provide valuable insight. Maybe after the holidays when airfare will be really cheap?

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38922
Credit: 578,590,566
RAC: 516,344
United States
Message 827596 - Posted: 6 Nov 2008, 16:56:52 UTC - in response to Message 827567.

After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation.

Seems hundreds of thousands of us realize we donate our spare computer cycles for the folks at the projects we're involved with to do with as they wish. If there are issues with the project, so be it, they will resolve. There is the Other group, the "10 percenters" that think The Seti group and especially Matt are employed to squeeze the most credit out off personal machines and should be available 24/7 to make it happen.

I Understand this is THEIR project, not mine and appreciate the fact they do what they do so I can dream of being the guy that discovers E.T. so he can find us and destroy us (just kidding).

Matt and the Seti Staff, you do great work and should be applauded. The rest of you, relax.

Well said, and welcome to the forums with your first post....
Try to stop by more often........you don't have to be so quiet, Dad.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,055,376
RAC: 4,577
United States
Message 827601 - Posted: 6 Nov 2008, 17:11:36 UTC - in response to Message 827557.

There is likely a scientific reason for Einstein's funding level. The SETI project has been more aspirational than scientific. This isn't a bad thing, but it does mean that aside from some modest amount of institutional funding, SETI is dependent on individuals for funding -- so there is a difference there for sure.

And I did not mean in any way to imply that the staff here is anything other than excellent here. They are doing tremendous work and (unlike nearly any other project) are very good at communicating what is going on.

My comment about encouragement of the BOINC concept of multi-project participation wasn't meant to suggest some nefarious intent here, more a matter of pointing out that the fragility of the project (have you tried to upload work recently for example?) *does* provide an incentive to multiproject participation (which isn't a bad thing). Heck, when Einstein went thru bad times (and they have had some bad patches over the years) I realized that two or even three projects was not adequate for the workstations I have running BOINC. I currently run no less than 3 projects on any workstation and typically more like 5 projects.

By the way, one thing that tends to push me to including other projects is that the applications that the 'big guys' (Einstein and SETI in particular) operate in such a way that they are more efficient on Intel CPU's -- I run almost entirely AMD's.

Regarding scalability, you may well have a point - the projects I'm comparing SETI to (aside from Einstein and Climate) have much smaller user counts (World Grid is perhaps in that same size range but a significantly different model), AND much smaller resources to work with. That they are at least moderately reliable (Spinhenge for example has been very reliable over the past year), with very limited resources, is a function of their much smaller user base. The thing is, when you run a project as large as SETI, and it's research is aspirational instead of research science oriented, you will have problems of scale, due to resource constraints (no government, and very limited institutional and corporate support), AND the user population. The noise level here *in part as a function of scale* is quite high. It would be off the charts except for the *exceptionally good* communications between the staff here and the user community.

Wow, that was a long screed -- heck, it took so long to compose that I even got work uploads to complete in the interim (smile).

Cheers

Barry



My comment about scalability


[/quote]

I completely disagree with you comments about scalability/reliability, it seems to me to only be a question of resources for Seti@home.

The second largest project Einstein@home easily has more staff, host computers, and better funding than Seti. Distributed across 5 sites worldwide, funded by government grants in the US and Germany. Currently on their home page they have job postings out for 4 additional staff. This translates into super stability. They currently have been up continuously for 92 days.

Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded.

[/quote]

____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8377
Credit: 4,106,343
RAC: 1,047
United Kingdom
Message 827602 - Posted: 6 Nov 2008, 17:12:57 UTC - in response to Message 827590.

... When developing a new feature or program in the business world, we should always ...

In case you hadn't noticed, this project has nothing to do with the business world.

Indeed, in the business world this project simply would not exist.

Now realign reality to academia and note that this project runs on academia with zero secure funding and irratic negligible funding.


I certainly wouldn't work under those conditions. It's a good job that Matt can skive off for a few days to go busking in the streets!

Keep searchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8630
Credit: 23,720,359
RAC: 19,179
United Kingdom
Message 827604 - Posted: 6 Nov 2008, 17:26:59 UTC

In comparison to Einstein and CPND, it is not only the number of users that causes comms and server problems but the relatively short task times.
If Seti was to go to all AP tasks, and I know it isn't we still have to do the MB search, then the number of tasks processed per day would decrease to about 40k rather than the million + required now.

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,055,376
RAC: 4,577
United States
Message 827633 - Posted: 6 Nov 2008, 19:52:12 UTC - in response to Message 827604.

Quite true -- but some trade offs -- with Einstein the long cycles discourage some folks. With Climate, they addressed the slow cycle for credit jockeys by providing trickle credits on a daily basis -- which is good, since a number of their models take hundreds of hours of CPU time. I've avoided Astropulse for a number of reasons, including the long cycle times and the relatively low credit dish outs with them. Then again, there is now an optimized application which works with AP workunits -- I might revisit that again.

Another issue which periodically causes I/O issues here are those really short work units which sometimes get generated.

By the way, Climate also has some very real resource issues -- so SETI isn't unique there.


In comparison to Einstein and CPND, it is not only the number of users that causes comms and server problems but the relatively short task times.
If Seti was to go to all AP tasks, and I know it isn't we still have to do the MB search, then the number of tasks processed per day would decrease to about 40k rather than the million + required now.


____________

1 · 2 · 3 · 4 · Next

Message boards : Technical News : Composite Head (Nov 05 2008)

Copyright © 2014 University of California