Composite Head (Nov 05 2008)

Message boards : Technical News : Composite Head (Nov 05 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 827334 - Posted: 5 Nov 2008, 21:32:17 UTC

At 7:30pm last night the scheduler apache server got hung up - probably from all the election night excitement. These apache servers need a kick fairly often. Unfortunately they die various way due to various things, so automating the checking of certain pulses doesn't always help - in fact such things usually make systems more complicated and unpredictable. In the case last night it failed during log rotation which issues an "httpd restart" - this time the head-in process didn't die, so port 80 got locked up. I had to log in and kill the zombie httpds by hand before restarting apache. Not a big deal, though it got missed for a couple hours as it was timed perfectly with the entire country busy watching the news.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 827334 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 827347 - Posted: 5 Nov 2008, 22:04:40 UTC
Last modified: 5 Nov 2008, 22:04:53 UTC

Thanks for the update, Matt. I can't see too many complaints on the Boards from last night, so I guess most of the world was watching the News (those that weren't sleeping, that is).

F.
ID: 827347 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8964
Credit: 12,678,685
RAC: 0
United States
Message 827361 - Posted: 5 Nov 2008, 22:59:09 UTC

Matt--thanks as always....


ID: 827361 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 827362 - Posted: 5 Nov 2008, 23:02:42 UTC

I had about 30 results that kept trying to report and failed, but the time span from noticing the problem to the problem being resolved was less than an hour. Didn't really bother me all that much.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 827362 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 827367 - Posted: 5 Nov 2008, 23:32:16 UTC - in response to Message 827362.  

The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project.



I had about 30 results that kept trying to report and failed, but the time span from noticing the problem to the problem being resolved was less than an hour. Didn't really bother me all that much.


ID: 827367 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 827368 - Posted: 5 Nov 2008, 23:36:10 UTC


. . . nice work Berkeley - and, Thanks for the Update Matt

BOINC Wiki . . .

Science Status Page . . .
ID: 827368 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 827374 - Posted: 6 Nov 2008, 0:18:17 UTC - in response to Message 827367.  

The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project.

Really depends on how you define BOINC.

An important goal was to produce something that'd be able to do big computing projects on no budget. Certainly, SETI@Home is the "flagship" project for no-budget big-computing.

ID: 827374 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 827383 - Posted: 6 Nov 2008, 0:42:09 UTC - in response to Message 827374.  

My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. But the point is, SETI has been for quite a while more than a bit 'twitchy' -- in a lot of ways -- and with the large active user count the 'twitchiness' gets noticed quickly and tends to blossom into various troublesome outage issues.

In the BOINC world, that isn't so bad, one simply must have multiple projects running. I certainly do. It is just that perhaps SETI as the flagship of the BOINC projects seems to have as an operational goal enough 'twitchiness' to insure that folks do the right BOINC thing and attach to multiple projects. In that regard it has been particularly admirable over the past months.


The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project.

Really depends on how you define BOINC.

An important goal was to produce something that'd be able to do big computing projects on no budget. Certainly, SETI@Home is the "flagship" project for no-budget big-computing.


ID: 827383 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 827384 - Posted: 6 Nov 2008, 0:44:39 UTC - in response to Message 827374.  

Ned, thank you for always pointing out the silver lining in our cloud worry, so to speak.

Of course, its possible that we are becoming the "flagship" project for no-budget big-computing, with the most error prone network communications and no scientific results. But that would probably tarnish your silver.

So I guess we should just hum the "don't worry; be happy" jingle along with Bobby McFarrin. We really don't need to get any better, after all.
ID: 827384 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 827385 - Posted: 6 Nov 2008, 0:47:27 UTC

I figured, since the network news bungled the past couple presidential elections... I'd watch "Get Smart" DVD and time shifted "Boston Legal" & "30 Rock" rather than the election coverage last night..."Denny Crane"

Oh... were there SETI servers issues last night? Sorry, I didn't notice..
ID: 827385 · Report as offensive
Profile CAHEN

Send message
Joined: 23 May 01
Posts: 3
Credit: 1,272,437
RAC: 0
Message 827528 - Posted: 6 Nov 2008, 9:22:25 UTC - in response to Message 827334.  

These apache servers need a kick fairly often.



Well, a simple cron job like full reboot and voila, the kick is done.

This is simply supposing to have a good start-up scripts.
ID: 827528 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 827542 - Posted: 6 Nov 2008, 12:01:26 UTC - in response to Message 827528.  

These apache servers need a kick fairly often.



Well, a simple cron job like full reboot and voila, the kick is done.

This is simply supposing to have a good start-up scripts.

If I recall some earlier posts about starting things up. There are several machines that have to e started in the right order. I don't know if one can be re-started after it goes down without bringing some of the other computers down for a while,


BOINC WIKI
ID: 827542 · Report as offensive
Rudy
Volunteer tester

Send message
Joined: 23 Jun 99
Posts: 189
Credit: 794,998
RAC: 0
Canada
Message 827557 - Posted: 6 Nov 2008, 13:37:50 UTC - in response to Message 827383.  

My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. /deleted stuff/


I completely disagree with you comments about scalability/reliability, it seems to me to only be a question of resources for Seti@home.

The second largest project Einstein@home easily has more staff, host computers, and better funding than Seti. Distributed across 5 sites worldwide, funded by government grants in the US and Germany. Currently on their home page they have job postings out for 4 additional staff. This translates into super stability. They currently have been up continuously for 92 days.

Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded.

ID: 827557 · Report as offensive
Profile QuietDad
Avatar

Send message
Joined: 2 Oct 99
Posts: 83
Credit: 28,926,603
RAC: 59
United States
Message 827567 - Posted: 6 Nov 2008, 14:25:06 UTC

After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation.

Seems hundreds of thousands of us realize we donate our spare computer cycles for the folks at the projects we're involved with to do with as they wish. If there are issues with the project, so be it, they will resolve. There is the Other group, the "10 percenters" that think The Seti group and especially Matt are employed to squeeze the most credit out off personal machines and should be available 24/7 to make it happen.

I Understand this is THEIR project, not mine and appreciate the fact they do what they do so I can dream of being the guy that discovers E.T. so he can find us and destroy us (just kidding).

Matt and the Seti Staff, you do great work and should be applauded. The rest of you, relax.
ID: 827567 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 827590 - Posted: 6 Nov 2008, 16:39:41 UTC - in response to Message 827557.  

My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. /deleted stuff/


Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded.


I agree that SETI's problems are not intentional. I thought the vision of BOINC was to provide an open-source, grid computing project with minimal staff and maintenance. In that sense, I think BOINC falls a little short.

When developing a new feature or program in the business world, we should always consider the administrative or maintenance cost of the project. For example, adding a feature that requires a nightly database update is generally a bad idea; a method to update the data more efficiently is preferred.

In the same vein, a grid computing project should always keep in mind "scale" meaning, "How big can this thing get?" The user base has clearly pushed SETI's servers to the max. Software bugs and limited staffing resources only make it worse. What is needed is to observe and analyze WHY projects such as Einstein, ClimatePrediction, and others have such good uptime. How do they divide the various BOINC processes on servers? Is it constant staff intervention? Is it co-location? Do any of them use virtualization?

Maybe the SETI team could visit UWM and see how they run things. This could provide valuable insight. Maybe after the holidays when airfare will be really cheap?
ID: 827590 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 827596 - Posted: 6 Nov 2008, 16:56:52 UTC - in response to Message 827567.  

After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation.

Seems hundreds of thousands of us realize we donate our spare computer cycles for the folks at the projects we're involved with to do with as they wish. If there are issues with the project, so be it, they will resolve. There is the Other group, the "10 percenters" that think The Seti group and especially Matt are employed to squeeze the most credit out off personal machines and should be available 24/7 to make it happen.

I Understand this is THEIR project, not mine and appreciate the fact they do what they do so I can dream of being the guy that discovers E.T. so he can find us and destroy us (just kidding).

Matt and the Seti Staff, you do great work and should be applauded. The rest of you, relax.

Well said, and welcome to the forums with your first post....
Try to stop by more often........you don't have to be so quiet, Dad.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 827596 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 827601 - Posted: 6 Nov 2008, 17:11:36 UTC - in response to Message 827557.  

There is likely a scientific reason for Einstein's funding level. The SETI project has been more aspirational than scientific. This isn't a bad thing, but it does mean that aside from some modest amount of institutional funding, SETI is dependent on individuals for funding -- so there is a difference there for sure.

And I did not mean in any way to imply that the staff here is anything other than excellent here. They are doing tremendous work and (unlike nearly any other project) are very good at communicating what is going on.

My comment about encouragement of the BOINC concept of multi-project participation wasn't meant to suggest some nefarious intent here, more a matter of pointing out that the fragility of the project (have you tried to upload work recently for example?) *does* provide an incentive to multiproject participation (which isn't a bad thing). Heck, when Einstein went thru bad times (and they have had some bad patches over the years) I realized that two or even three projects was not adequate for the workstations I have running BOINC. I currently run no less than 3 projects on any workstation and typically more like 5 projects.

By the way, one thing that tends to push me to including other projects is that the applications that the 'big guys' (Einstein and SETI in particular) operate in such a way that they are more efficient on Intel CPU's -- I run almost entirely AMD's.

Regarding scalability, you may well have a point - the projects I'm comparing SETI to (aside from Einstein and Climate) have much smaller user counts (World Grid is perhaps in that same size range but a significantly different model), AND much smaller resources to work with. That they are at least moderately reliable (Spinhenge for example has been very reliable over the past year), with very limited resources, is a function of their much smaller user base. The thing is, when you run a project as large as SETI, and it's research is aspirational instead of research science oriented, you will have problems of scale, due to resource constraints (no government, and very limited institutional and corporate support), AND the user population. The noise level here *in part as a function of scale* is quite high. It would be off the charts except for the *exceptionally good* communications between the staff here and the user community.

Wow, that was a long screed -- heck, it took so long to compose that I even got work uploads to complete in the interim (smile).

Cheers

Barry



My comment about scalability


[/quote]

I completely disagree with you comments about scalability/reliability, it seems to me to only be a question of resources for Seti@home.

The second largest project Einstein@home easily has more staff, host computers, and better funding than Seti. Distributed across 5 sites worldwide, funded by government grants in the US and Germany. Currently on their home page they have job postings out for 4 additional staff. This translates into super stability. They currently have been up continuously for 92 days.

Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded.

[/quote]

ID: 827601 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21237
Credit: 7,508,002
RAC: 20
United Kingdom
Message 827602 - Posted: 6 Nov 2008, 17:12:57 UTC - in response to Message 827590.  

... When developing a new feature or program in the business world, we should always ...

In case you hadn't noticed, this project has nothing to do with the business world.

Indeed, in the business world this project simply would not exist.

Now realign reality to academia and note that this project runs on academia with zero secure funding and irratic negligible funding.


I certainly wouldn't work under those conditions. It's a good job that Matt can skive off for a few days to go busking in the streets!

Keep searchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 827602 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19404
Credit: 40,757,560
RAC: 67
United Kingdom
Message 827604 - Posted: 6 Nov 2008, 17:26:59 UTC

In comparison to Einstein and CPND, it is not only the number of users that causes comms and server problems but the relatively short task times.
If Seti was to go to all AP tasks, and I know it isn't we still have to do the MB search, then the number of tasks processed per day would decrease to about 40k rather than the million + required now.
ID: 827604 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 827633 - Posted: 6 Nov 2008, 19:52:12 UTC - in response to Message 827604.  

Quite true -- but some trade offs -- with Einstein the long cycles discourage some folks. With Climate, they addressed the slow cycle for credit jockeys by providing trickle credits on a daily basis -- which is good, since a number of their models take hundreds of hours of CPU time. I've avoided Astropulse for a number of reasons, including the long cycle times and the relatively low credit dish outs with them. Then again, there is now an optimized application which works with AP workunits -- I might revisit that again.

Another issue which periodically causes I/O issues here are those really short work units which sometimes get generated.

By the way, Climate also has some very real resource issues -- so SETI isn't unique there.


In comparison to Einstein and CPND, it is not only the number of users that causes comms and server problems but the relatively short task times.
If Seti was to go to all AP tasks, and I know it isn't we still have to do the MB search, then the number of tasks processed per day would decrease to about 40k rather than the million + required now.


ID: 827633 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Technical News : Composite Head (Nov 05 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.