Message boards :
Technical News :
Composite Head (Nov 05 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
At 7:30pm last night the scheduler apache server got hung up - probably from all the election night excitement. These apache servers need a kick fairly often. Unfortunately they die various way due to various things, so automating the checking of certain pulses doesn't always help - in fact such things usually make systems more complicated and unpredictable. In the case last night it failed during log rotation which issues an "httpd restart" - this time the head-in process didn't die, so port 80 got locked up. I had to log in and kill the zombie httpds by hand before restarting apache. Not a big deal, though it got missed for a couple hours as it was timed perfectly with the entire country busy watching the news. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Thanks for the update, Matt. I can't see too many complaints on the Boards from last night, so I guess most of the world was watching the News (those that weren't sleeping, that is). F. |
Blurf Send message Joined: 2 Sep 06 Posts: 8962 Credit: 12,678,685 RAC: 0 |
Matt--thanks as always.... |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I had about 30 results that kept trying to report and failed, but the time span from noticing the problem to the problem being resolved was less than an hour. Didn't really bother me all that much. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project. I had about 30 results that kept trying to report and failed, but the time span from noticing the problem to the problem being resolved was less than an hour. Didn't really bother me all that much. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . nice work Berkeley - and, Thanks for the Update Matt BOINC Wiki . . . Science Status Page . . . |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project. Really depends on how you define BOINC. An important goal was to produce something that'd be able to do big computing projects on no budget. Certainly, SETI@Home is the "flagship" project for no-budget big-computing. |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. But the point is, SETI has been for quite a while more than a bit 'twitchy' -- in a lot of ways -- and with the large active user count the 'twitchiness' gets noticed quickly and tends to blossom into various troublesome outage issues. In the BOINC world, that isn't so bad, one simply must have multiple projects running. I certainly do. It is just that perhaps SETI as the flagship of the BOINC projects seems to have as an operational goal enough 'twitchiness' to insure that folks do the right BOINC thing and attach to multiple projects. In that regard it has been particularly admirable over the past months. The problem is still out there -- the servers have gotten rather fragile over the years -- sort of troublesome with SETI in theory being the flagship BOINC project. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Ned, thank you for always pointing out the silver lining in our cloud worry, so to speak. Of course, its possible that we are becoming the "flagship" project for no-budget big-computing, with the most error prone network communications and no scientific results. But that would probably tarnish your silver. So I guess we should just hum the "don't worry; be happy" jingle along with Bobby McFarrin. We really don't need to get any better, after all. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I figured, since the network news bungled the past couple presidential elections... I'd watch "Get Smart" DVD and time shifted "Boston Legal" & "30 Rock" rather than the election coverage last night..."Denny Crane" Oh... were there SETI servers issues last night? Sorry, I didn't notice.. |
CAHEN Send message Joined: 23 May 01 Posts: 3 Credit: 1,272,437 RAC: 0 |
These apache servers need a kick fairly often. Well, a simple cron job like full reboot and voila, the kick is done. This is simply supposing to have a good start-up scripts. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
These apache servers need a kick fairly often. If I recall some earlier posts about starting things up. There are several machines that have to e started in the right order. I don't know if one can be re-started after it goes down without bringing some of the other computers down for a while, BOINC WIKI |
Rudy Send message Joined: 23 Jun 99 Posts: 189 Credit: 794,998 RAC: 0 |
My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. /deleted stuff/ I completely disagree with you comments about scalability/reliability, it seems to me to only be a question of resources for Seti@home. The second largest project Einstein@home easily has more staff, host computers, and better funding than Seti. Distributed across 5 sites worldwide, funded by government grants in the US and Germany. Currently on their home page they have job postings out for 4 additional staff. This translates into super stability. They currently have been up continuously for 92 days. Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded. |
QuietDad Send message Joined: 2 Oct 99 Posts: 83 Credit: 28,926,603 RAC: 59 |
After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation. Seems hundreds of thousands of us realize we donate our spare computer cycles for the folks at the projects we're involved with to do with as they wish. If there are issues with the project, so be it, they will resolve. There is the Other group, the "10 percenters" that think The Seti group and especially Matt are employed to squeeze the most credit out off personal machines and should be available 24/7 to make it happen. I Understand this is THEIR project, not mine and appreciate the fact they do what they do so I can dream of being the guy that discovers E.T. so he can find us and destroy us (just kidding). Matt and the Seti Staff, you do great work and should be applauded. The rest of you, relax. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
My sense is that SETI is the largest of the BOINC projects not only in active users but also in actual budget resources (people and equipment). I am not saying it has a LOT of resources, but that a lot of other smaller projects are just that, a LOT smaller in terms of budget and resources -- and in active users. Perhaps the issue is that BOINC projects in general don't scale well -- I don't know. /deleted stuff/ I agree that SETI's problems are not intentional. I thought the vision of BOINC was to provide an open-source, grid computing project with minimal staff and maintenance. In that sense, I think BOINC falls a little short. When developing a new feature or program in the business world, we should always consider the administrative or maintenance cost of the project. For example, adding a feature that requires a nightly database update is generally a bad idea; a method to update the data more efficiently is preferred. In the same vein, a grid computing project should always keep in mind "scale" meaning, "How big can this thing get?" The user base has clearly pushed SETI's servers to the max. Software bugs and limited staffing resources only make it worse. What is needed is to observe and analyze WHY projects such as Einstein, ClimatePrediction, and others have such good uptime. How do they divide the various BOINC processes on servers? Is it constant staff intervention? Is it co-location? Do any of them use virtualization? Maybe the SETI team could visit UWM and see how they run things. This could provide valuable insight. Maybe after the holidays when airfare will be really cheap? |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation. Well said, and welcome to the forums with your first post.... Try to stop by more often........you don't have to be so quiet, Dad. "Time is simply the mechanism that keeps everything from happening all at once." |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
There is likely a scientific reason for Einstein's funding level. The SETI project has been more aspirational than scientific. This isn't a bad thing, but it does mean that aside from some modest amount of institutional funding, SETI is dependent on individuals for funding -- so there is a difference there for sure. And I did not mean in any way to imply that the staff here is anything other than excellent here. They are doing tremendous work and (unlike nearly any other project) are very good at communicating what is going on. My comment about encouragement of the BOINC concept of multi-project participation wasn't meant to suggest some nefarious intent here, more a matter of pointing out that the fragility of the project (have you tried to upload work recently for example?) *does* provide an incentive to multiproject participation (which isn't a bad thing). Heck, when Einstein went thru bad times (and they have had some bad patches over the years) I realized that two or even three projects was not adequate for the workstations I have running BOINC. I currently run no less than 3 projects on any workstation and typically more like 5 projects. By the way, one thing that tends to push me to including other projects is that the applications that the 'big guys' (Einstein and SETI in particular) operate in such a way that they are more efficient on Intel CPU's -- I run almost entirely AMD's. Regarding scalability, you may well have a point - the projects I'm comparing SETI to (aside from Einstein and Climate) have much smaller user counts (World Grid is perhaps in that same size range but a significantly different model), AND much smaller resources to work with. That they are at least moderately reliable (Spinhenge for example has been very reliable over the past year), with very limited resources, is a function of their much smaller user base. The thing is, when you run a project as large as SETI, and it's research is aspirational instead of research science oriented, you will have problems of scale, due to resource constraints (no government, and very limited institutional and corporate support), AND the user population. The noise level here *in part as a function of scale* is quite high. It would be off the charts except for the *exceptionally good* communications between the staff here and the user community. Wow, that was a long screed -- heck, it took so long to compose that I even got work uploads to complete in the interim (smile). Cheers Barry My comment about scalability [/quote] I completely disagree with you comments about scalability/reliability, it seems to me to only be a question of resources for Seti@home. The second largest project Einstein@home easily has more staff, host computers, and better funding than Seti. Distributed across 5 sites worldwide, funded by government grants in the US and Germany. Currently on their home page they have job postings out for 4 additional staff. This translates into super stability. They currently have been up continuously for 92 days. Considering the resources Seti has compared to the number of users, I think the staff are doing an excellent job. Any suggestion that the Seti's problems are intentional or due to bad Boinc software are unfounded. [/quote] |
ML1 Send message Joined: 25 Nov 01 Posts: 21002 Credit: 7,508,002 RAC: 20 |
... When developing a new feature or program in the business world, we should always ... In case you hadn't noticed, this project has nothing to do with the business world. Indeed, in the business world this project simply would not exist. Now realign reality to academia and note that this project runs on academia with zero secure funding and irratic negligible funding. I certainly wouldn't work under those conditions. It's a good job that Matt can skive off for a few days to go busking in the streets! Keep searchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
W-K 666 Send message Joined: 18 May 99 Posts: 19314 Credit: 40,757,560 RAC: 67 |
In comparison to Einstein and CPND, it is not only the number of users that causes comms and server problems but the relatively short task times. If Seti was to go to all AP tasks, and I know it isn't we still have to do the MB search, then the number of tasks processed per day would decrease to about 40k rather than the million + required now. |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
Quite true -- but some trade offs -- with Einstein the long cycles discourage some folks. With Climate, they addressed the slow cycle for credit jockeys by providing trickle credits on a daily basis -- which is good, since a number of their models take hundreds of hours of CPU time. I've avoided Astropulse for a number of reasons, including the long cycle times and the relatively low credit dish outs with them. Then again, there is now an optimized application which works with AP workunits -- I might revisit that again. Another issue which periodically causes I/O issues here are those really short work units which sometimes get generated. By the way, Climate also has some very real resource issues -- so SETI isn't unique there. In comparison to Einstein and CPND, it is not only the number of users that causes comms and server problems but the relatively short task times. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.