Somebody needs to kick the servers!

Author	Message
Chris Weaver Send message Joined: 3 Apr 99 Posts: 6 Credit: 151,343 RAC: 0	Message 807862 - Posted: 13 Sep 2008, 21:34:31 UTC Looks like a process is acting up on S@H. A quick check of the server status reveals that no work units are being created and that the u/dl servers are up but I am getting messages stating that the project servers may be down and that the http service is unavailable. Also getting messages that the servers are not responding (not returning data or headers). Because of this, I can't upload my completed WU's. Is anybody else experiencing this? ID: 807862 ·

xfile1971 Send message Joined: 3 Nov 06 Posts: 5 Credit: 197,438 RAC: 0	Message 807863 - Posted: 13 Sep 2008, 21:37:04 UTC - in response to Message 807862. Looks like a process is acting up on S@H. A quick check of the server status reveals that no work units are being created and that the u/dl servers are up but I am getting messages stating that the project servers may be down and that the http service is unavailable. Also getting messages that the servers are not responding (not returning data or headers). Because of this, I can't upload my completed WU's. Is anybody else experiencing this? Yeah. I just started experiencing this problem and was wondering what was going on. These sorts of things love happening on the weekends. Is anybody there on a Saturday? ID: 807863 ·

the silver surfer Send message Joined: 24 Feb 01 Posts: 131 Credit: 3,739,307 RAC: 0	Message 807865 - Posted: 13 Sep 2008, 21:41:24 UTC - in response to Message 807862. Looks like a process is acting up on S@H. A quick check of the server status reveals that no work units are being created and that the u/dl servers are up but I am getting messages stating that the project servers may be down and that the http service is unavailable. Also getting messages that the servers are not responding (not returning data or headers). Because of this, I can't upload my completed WU's. Is anybody else experiencing this? Same situation on my side - No uploads possible, everything else is working fine. ID: 807865 ·

Morris Volunteer tester Send message Joined: 11 Sep 01 Posts: 57 Credit: 9,077,302 RAC: 29	Message 807868 - Posted: 13 Sep 2008, 21:49:38 UTC Same prob here, except that i could upload but NOT report wu .... 9/13/2008 11:38:06 PM\|SETI@home\|Scheduler request failed: couldn't connect to server Server Status page shows that (almost) all servers are up and running, but outgoing traffic from Berkeley dramatically decreased (less than half the average daily traffic) in the last hour or so, IMHO some server needs to be bootkicked ... As usual, all of this happens in the middle of the weekend... As we say here, good luck can be blind, but bad luck has a good eyesight.... M. ID: 807868 ·

champ Volunteer tester Send message Joined: 12 Mar 03 Posts: 3642 Credit: 1,489,147 RAC: 0	Message 807876 - Posted: 13 Sep 2008, 22:02:07 UTC Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. ID: 807876 ·

jim little Send message Joined: 3 Apr 99 Posts: 112 Credit: 915,934 RAC: 0	Message 807880 - Posted: 13 Sep 2008, 22:22:07 UTC - in response to Message 807863. Looks like a process is acting up on S@H. A quick check of the server status reveals that no work units are being created and that the u/dl servers are up but I am getting messages stating that the project servers may be down and that the http service is unavailable. Also getting messages that the servers are not responding (not returning data or headers). Because of this, I can't upload my completed WU's. Is anybody else experiencing this? Yeah. I just started experiencing this problem and was wondering what was going on. These sorts of things love happening on the weekends. Is anybody there on a Saturday? ============ No. They put on enough hours in the other five days! duke ID: 807880 ·

Blurf Volunteer tester Send message Joined: 2 Sep 06 Posts: 8962 Credit: 12,678,685 RAC: 0	Message 807889 - Posted: 13 Sep 2008, 23:10:27 UTC - in response to Message 807863. Yeah. I just started experiencing this problem and was wondering what was going on. These sorts of things love happening on the weekends. Is anybody there on a Saturday? Chris--the hours of the staff are Monday-Friday and they are running on a shoestring staff as is. If more funding came in they could hire an additional person to cover weekends. Might want to read up on some ideas to save Seti ID: 807889 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 807907 - Posted: 14 Sep 2008, 0:03:24 UTC By my observation, they have been well and truly kicked. Do I have a seconder for a vote of (Saturday night) thanks? ID: 807907 ·

C Send message Joined: 3 Apr 99 Posts: 240 Credit: 7,716,977 RAC: 0	Message 807909 - Posted: 14 Sep 2008, 0:06:50 UTC - in response to Message 807907. By my observation, they have been well and truly kicked. Do I have a seconder for a vote of (Saturday night) thanks? I'll second the motion. Server has units to send it, and I just uploaded, and reported, some. C Join Team MacNN ID: 807909 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 807911 - Posted: 14 Sep 2008, 0:09:12 UTC - in response to Message 807907. By my observation, they have been well and truly kicked. Do I have a seconder for a vote of (Saturday night) thanks? ... and while we're at it, a hearty thanks for Saturday help from those who work a normal monday-through-friday 40 hour* week. ID: 807911 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 807915 - Posted: 14 Sep 2008, 0:13:33 UTC - in response to Message 807909. Last modified: 14 Sep 2008, 0:14:01 UTC By my observation, they have been well and truly kicked. Do I have a seconder for a vote of (Saturday night) thanks? I'll second the motion. Server has units to send it, and I just uploaded, and reported, some. C And I just downloaded four fresh ones. Carried nem. con. ID: 807915 ·

JBWoolley Send message Joined: 8 May 07 Posts: 35 Credit: 6,214,366 RAC: 0	Message 808260 - Posted: 14 Sep 2008, 23:33:48 UTC - in response to Message 807876. Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. I'm a newbie on these message boards. So Please be kind. But servers don't go down because they get "tired" and need a rest. :-) I bet there is a reason for these outages. My paying job is to look at server performance (and availability), determine the root cause of the issues, and offer solutions. (I work for a USA national HMO.) The frequency and predictibility of the outages (every Sunday about noon pacific time) make these outages very likely O/S "memory leak" related. If anyone personally knows Eric... Please have him (or them) them send me the outage related Heap Dump.... (Assuming they are using Unix, AIX or Linux... one or more dumps are usually created for these outages.) With just a few hours of analysis, I should be able to identify the Memory Leak(s) along with identifying the class(es) and object(s).... and then let the SETI resident specialists work thru and correct them. Imagine.... We could let these guys have their personal lives, and continue crunching because the SETI servers always stay up... This is possible. :-)) Thanks, Jack Woolley jbwoolley@yahoo.com ID: 808260 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 808266 - Posted: 14 Sep 2008, 23:51:26 UTC - in response to Message 808260. Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. I'm a newbie on these message boards. So Please be kind. But servers don't go down because they get "tired" and need a rest. :-) I bet there is a reason for these outages. Jack, The root cause of all of the problems is funding. Generally speaking, the outages aren't crashes, so there likely isn't a dump. It's the result of some server losing a mount. Matt has pointed out that most every server mounts the drives for every other server, so there is much, much file sharing going on. It's far from ideal. More hardware would likely mean that they could reduce the dependence on NFS mounts. But the real problem is that there are two people we can think of as the entire operations staff -- and they have other responsibilities. Dr. Korpela (Eric) pitches in, even though it isn't his job. A graduate student wrote Astropulse, and there are some volunteer developers who help -- but they don't do the operational stuff. ... and after that, well, I don't think I've missed anyone. The BOINC project is separate, and has to be because the funding sources cannot be mixed. The servers are "interesting" as well. I think it was replaced, but Joycelyn was at one time running one of the engineering test beds for the V40z, and I understand wasn't at all like the production V40z machines. Most of the rest are either donated "white box" systems, or hand-me-downs. Most (if not all) are running Linux, because that is what the project can afford. They run best during the week because people are in the office and can "kick" the machines when they act up. On the weekend, they get kicked remotely. So, I'll echo the "please donate" theme, but I don't want to buy another server as much as I want to see another staff member -- or at least see the current staff continue to be paid. -- Ned ID: 808266 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49	Message 808289 - Posted: 15 Sep 2008, 0:47:05 UTC - in response to Message 808266. Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. I'm a newbie on these message boards. So Please be kind. But servers don't go down because they get "tired" and need a rest. :-) I bet there is a reason for these outages. Jack, The root cause of all of the problems is funding. Generally speaking, the outages aren't crashes, so there likely isn't a dump. It's the result of some server losing a mount. Matt has pointed out that most every server mounts the drives for every other server, so there is much, much file sharing going on. It's far from ideal. More hardware would likely mean that they could reduce the dependence on NFS mounts. But the real problem is that there are two people we can think of as the entire operations staff -- and they have other responsibilities. Dr. Korpela (Eric) pitches in, even though it isn't his job. A graduate student wrote Astropulse, and there are some volunteer developers who help -- but they don't do the operational stuff. ... and after that, well, I don't think I've missed anyone. The BOINC project is separate, and has to be because the funding sources cannot be mixed. The servers are "interesting" as well. I think it was replaced, but Joycelyn was at one time running one of the engineering test beds for the V40z, and I understand wasn't at all like the production V40z machines. Most of the rest are either donated "white box" systems, or hand-me-downs. Most (if not all) are running Linux, because that is what the project can afford. They run best during the week because people are in the office and can "kick" the machines when they act up. On the weekend, they get kicked remotely. So, I'll echo the "please donate" theme, but I don't want to buy another server as much as I want to see another staff member -- or at least see the current staff continue to be paid. -- Ned It's too bad We can't get Bill Gates to to Donate $10,000.00, If I had enough money I'd do that, But I'm stuck. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 808289 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 808291 - Posted: 15 Sep 2008, 0:53:23 UTC - in response to Message 808289. It's too bad We can't get Bill Gates to to Donate $10,000.00, If I had enough money I'd do that, But I'm stuck. Yeah.... With all the new rigs the project spawns,,,,, You would think it might even be a good busineess model decision....... Mush less an emotional decision........ "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 808291 ·

JBWoolley Send message Joined: 8 May 07 Posts: 35 Credit: 6,214,366 RAC: 0	Message 808306 - Posted: 15 Sep 2008, 1:48:25 UTC - in response to Message 808266. Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. I'm a newbie on these message boards. So Please be kind. But servers don't go down because they get "tired" and need a rest. :-) I bet there is a reason for these outages. Jack, The root cause of all of the problems is funding. Generally speaking, the outages aren't crashes, so there likely isn't a dump. It's the result of some server losing a mount. Matt has pointed out that most every server mounts the drives for every other server, so there is much, much file sharing going on. It's far from ideal. More hardware would likely mean that they could reduce the dependence on NFS mounts. But the real problem is that there are two people we can think of as the entire operations staff -- and they have other responsibilities. Dr. Korpela (Eric) pitches in, even though it isn't his job. A graduate student wrote Astropulse, and there are some volunteer developers who help -- but they don't do the operational stuff. ... and after that, well, I don't think I've missed anyone. The BOINC project is separate, and has to be because the funding sources cannot be mixed. The servers are "interesting" as well. I think it was replaced, but Joycelyn was at one time running one of the engineering test beds for the V40z, and I understand wasn't at all like the production V40z machines. Most of the rest are either donated "white box" systems, or hand-me-downs. Most (if not all) are running Linux, because that is what the project can afford. They run best during the week because people are in the office and can "kick" the machines when they act up. On the weekend, they get kicked remotely. So, I'll echo the "please donate" theme, but I don't want to buy another server as much as I want to see another staff member -- or at least see the current staff continue to be paid. -- Ned Guess I wasn't clear.... Yes, I'll donate.... my time (and what little experience I've accumulated over the years). Yes, mounting NTFS volumes to multiple machines at once, is known to have significant stability problems. Sorry.. sounds like GOOD news... (BOINC may have outgrown it's current server infrastructure scalability.) I'm glad we are growing so fast! "The industry" has faced these scalability/availability issues for many years and have solutions for most issues. Please consider this: Maybe turning away offers of free help... and (instead) asking for more $$$... may not be the best alternative/solution for the ongoing BOINC issues. ------------ I look at it this way. The basic BOINC mindset is to DISRTIBUTE the crunching work across many, many resources/computers. Why can't BOINC move away from the current centralized administration... to a more distributed administration (to include a trusted few volenteers) ? Thanks, Jack Woolley ID: 808306 ·

Ace Casino Send message Joined: 5 Feb 03 Posts: 285 Credit: 29,750,804 RAC: 15	Message 808322 - Posted: 15 Sep 2008, 2:36:23 UTC Jack, is willing to donate his time and expertise and your knocking him down??? Jack, post your intentions to help in the Ã¢â‚¬Å“technical newsÃ¢â‚¬Â section, under the last thread Matt has posted, and see what comes of it. Maybe Jack is privy to software that could identify a problemÃ¢â‚¬Â¦who knows? Money is great, but why discourage someone wanting to help out if they may be able to.??? The Red Cross gets millions from donations but could not do its job without volunteers! And as a side note the Red Cross and most every other reputable organization that asks for donations has learned: the more you ask for money the less people giveÃ¢â‚¬Â¦.this is a fact! This is the reason for an annual fund drive. SETI and anyone who keeps saying give, give giveÃ¢â‚¬Â¦may actually be hurting the cause. You may see initial spikes in donations but in the long run you will see lessÃ¢â‚¬Â¦facts are facts. When Blurff started his fund drive when SETI was down for several days, it was a brilliant move. When Blurff started his second fund drive a couple days after the first, it was a very poor move and the SETI staff should have stopped it. Right now how many of you are saying: but the second drive raised money too. Yes, it did, but probably in the short term, it may have hurt the long term. People who donated during the first drive may have felt a sense of community. May have felt special being part of something unique. May have felt their individual donation is being recognized as important. It may have been their first donation and this was awesome to do. Then SETI starts another drive a couple days later. How many of the people who donated in the first drive felt let down? That the first fund drive was not so special. They may even feel dooped over it if SETI is going to hold a fund drive every few days. Some or many who donated may be saying I wont fall for that again, and never give again. This is the nature of fund raising my friends. itÃ¢â‚¬â„¢s a very slippery slope. I know some of you have good intentions. Ask to often for money and people WILL turn awayÃ¢â‚¬Â¦permanently! ID: 808322 ·

JBWoolley Send message Joined: 8 May 07 Posts: 35 Credit: 6,214,366 RAC: 0	Message 808330 - Posted: 15 Sep 2008, 2:48:20 UTC - in response to Message 808306. Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. I'm a newbie on these message boards. So Please be kind. But servers don't go down because they get "tired" and need a rest. :-) I bet there is a reason for these outages. Jack, The root cause of all of the problems is funding. Generally speaking, the outages aren't crashes, so there likely isn't a dump. It's the result of some server losing a mount. Matt has pointed out that most every server mounts the drives for every other server, so there is much, much file sharing going on. It's far from ideal. More hardware would likely mean that they could reduce the dependence on NFS mounts. But the real problem is that there are two people we can think of as the entire operations staff -- and they have other responsibilities. Dr. Korpela (Eric) pitches in, even though it isn't his job. A graduate student wrote Astropulse, and there are some volunteer developers who help -- but they don't do the operational stuff. ... and after that, well, I don't think I've missed anyone. The BOINC project is separate, and has to be because the funding sources cannot be mixed. The servers are "interesting" as well. I think it was replaced, but Joycelyn was at one time running one of the engineering test beds for the V40z, and I understand wasn't at all like the production V40z machines. Most of the rest are either donated "white box" systems, or hand-me-downs. Most (if not all) are running Linux, because that is what the project can afford. They run best during the week because people are in the office and can "kick" the machines when they act up. On the weekend, they get kicked remotely. So, I'll echo the "please donate" theme, but I don't want to buy another server as much as I want to see another staff member -- or at least see the current staff continue to be paid. -- Ned Guess I wasn't clear.... Yes, I'll donate.... my time (and what little experience I've accumulated over the years). Yes, mounting NTFS volumes to multiple machines at once, is known to have significant stability problems. Sorry.. sounds like GOOD news... (BOINC may have outgrown it's current server infrastructure scalability.) I'm glad we are growing so fast! "The industry" has faced these scalability/availability issues for many years and have solutions for most issues. Please consider this: Maybe turning away offers of free help... and (instead) asking for more $$$... may not be the best alternative/solution for the ongoing BOINC issues. ------------ I look at it this way. The basic BOINC mindset is to DISRTIBUTE the crunching work across many, many resources/computers. Why can't BOINC move away from the current centralized administration... to a more distributed administration (to include a trusted few volenteers) ? Thanks, Jack Woolley I'm sorry, I have to add some more. One of the main reasons I DON'T just donate $$$ is contained in the two "please donate" (above) comments. The way it's described.... More servers = more mounts. (Making the mount isses/outages even worse.) And getting another person to "kick" the servers on the weekend"... To me, is like putting a bandaid on a broken leg. (Addressing the availability symptom, not the root cause of the problems.) Neither of the above "solutions" I want to fund. However I offer assistance with problem determination. And given an opportunity, hope to assist with time (& possibly money) for implementing a root solution. Hope this helps, Jack Woolley ID: 808330 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 808337 - Posted: 15 Sep 2008, 3:05:56 UTC Ummmmmmmm....Seti has been rambling about on a 'broken leg' for quite some time now.......... And they have gone far, considering the handicap...... Don't you all think that they would just love to have unlimited resources to bandy about at will???? Alas, they do not. So the requests for donations will continue ad infinitum........and those of you who care will answer the call, and those of you who just sit in the bleachers screaming will not. I have donated much to this project.......both in terms of computer resources and also in hard cash. And shall continue to do so. It is my quest....... It is Mankind's quest...... To know that 'we are not alone'...... "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 808337 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30651 Credit: 53,134,872 RAC: 32	Message 808340 - Posted: 15 Sep 2008, 3:09:32 UTC - in response to Message 808266. Easy going guys.... Please donate, that Eric and the crew can buy another or a new server. The prob will be solved then. I'm a newbie on these message boards. So Please be kind. But servers don't go down because they get "tired" and need a rest. :-) I bet there is a reason for these outages. Jack, The root cause of all of the problems is funding. Generally speaking, the outages aren't crashes, so there likely isn't a dump. It's the result of some server losing a mount. Matt has pointed out that most every server mounts the drives for every other server, so there is much, much file sharing going on. It's far from ideal. More hardware would likely mean that they could reduce the dependence on NFS mounts. But the real problem is that there are two people we can think of as the entire operations staff -- and they have other responsibilities. Dr. Korpela (Eric) pitches in, even though it isn't his job. A graduate student wrote Astropulse, and there are some volunteer developers who help -- but they don't do the operational stuff. ... and after that, well, I don't think I've missed anyone. The BOINC project is separate, and has to be because the funding sources cannot be mixed. The servers are "interesting" as well. I think it was replaced, but Joycelyn was at one time running one of the engineering test beds for the V40z, and I understand wasn't at all like the production V40z machines. Most of the rest are either donated "white box" systems, or hand-me-downs. Most (if not all) are running Linux, because that is what the project can afford. They run best during the week because people are in the office and can "kick" the machines when they act up. On the weekend, they get kicked remotely. So, I'll echo the "please donate" theme, but I don't want to buy another server as much as I want to see another staff member -- or at least see the current staff continue to be paid. -- Ned Jack: Let me add: http://setiathome.berkeley.edu/sah_porting.php If you are serious about helping. As for BOINC issues: http://boinc.berkeley.edu/trac/wiki/SourceCode It is all open source. You should contact Rom to see where you can best help. However I believe they know what the problems are and it isn't memory leaks. Seems to be an issue of not enough disk space. Two issues wrapped into one. First is what Ned mentions, the lost NFS mounts. They need enough $$ to get each machine (server) on it own set of drives so they don't have to cross mount everything. The second was recently touched on in http://setiathome.berkeley.edu/tech_news.php that the science database (not part of the public facing project) has run out of space to store results. Only one thing is going to solve both issues and that is cash to buy hard disk's or someone donating a bunch. And let me toss one more thing about memory leaks out there. SETI isn't the only project using BOINC. Others aren't having a problem, so I don't think there is a memory leak problem in the server side software or it would show up on other projects as well. Sorry for the late reply, but I got called away while it was 1/2 written. Gary ID: 808340 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.