Down ... again?!

Author	Message
Fuzzy Hollynoodles Volunteer tester Send message Joined: 3 Apr 99 Posts: 9659 Credit: 251,998 RAC: 0	Message 258640 - Posted: 7 Mar 2006, 15:52:37 UTC - in response to Message 258552. Hmm. Looks like our mail server crashed, and all the machines in the network are hanging on that. Maybe they'll push through eventually, but we'll probably be hurtin' all night. I'm not going up to the lab to kick the server. I'm going to sleep. - Matt You guys sleep? ;) And they eat too?! :-O No wonder that they need two or more jobs to keep the ends together. So please make a money donation to the project, so we can help them getting some less obsolete hardware to work with. That would be a good way to show our appreciation for their work for us also. "I'm trying to maintain a shred of dignity in this world." - Me ID: 258640 ·

Pilot Send message Joined: 18 May 99 Posts: 534 Credit: 5,475,482 RAC: 0	Message 258642 - Posted: 7 Mar 2006, 15:53:25 UTC - in response to Message 258637. Hmm. Looks like our mail server crashed, and all the machines in the network are hanging on that. Maybe they'll push through eventually, but we'll probably be hurtin' all night. I'm not going up to the lab to kick the server. I'm going to sleep. - Matt If you got paid what the Qantas IT staff get paid to be called out, when on call, you'd be there as quick as your little feet could carry you AND then stay as LONG as you could. 8-D I can't remember all the rates, but I think answering the phone was $500!!!!!! Unfortunatly, I was on subcontract :-( I only got double rates. 8-) Do you thing I should ring Gill and ask for another job ;-) Seems like they could benefit by going to MIT or someplace for a good CS class;) They don't seem to have benefited from anything taught locally. When we finally figure it all out, all the rules will change and we can start all over again. ID: 258642 ·

Scarecrow Send message Joined: 15 Jul 00 Posts: 4520 Credit: 486,601 RAC: 0	Message 258651 - Posted: 7 Mar 2006, 16:09:24 UTC Last modified: 7 Mar 2006, 16:10:14 UTC Look at the bright side. It's the first time in 4 days that the ready to send queue has had more than 30 results in it. Can't get to 'em, but by golly they're there. Maybe that was a 'scheduled mail server crash" to help get caught up. :) ID: 258651 ·

[B@H] Ray Volunteer tester Send message Joined: 1 Sep 00 Posts: 485 Credit: 45,275 RAC: 0	Message 258653 - Posted: 7 Mar 2006, 16:15:56 UTC - in response to Message 258640. And they eat too?! :-O They do? Thought there was no time lift for that. Just computers, SETI and music. ID: 258653 ·

John Clark Volunteer tester Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0	Message 258657 - Posted: 7 Mar 2006, 16:25:08 UTC - in response to Message 258653. Last modified: 7 Mar 2006, 17:13:28 UTC And they eat too?! :-O They do? Thought there was no time lift for that. Just computers, SETI and music. As you can see from the Cogent Link graphs - http://fragment1.berkeley.edu/~cricket/inr-668-interfaces.html - these are back to full bandwidth. This means WUs are being distributed, but, as usual, there is a backlog to cruncher demands. The latter, as always, will take time to clear the demand. I see from the Server Status page that the "WUs outstanding" numbers are closing in to the normal working level (circa 2.35 million) So, given a bit of fair wind all will return to normal in a few hours. Matt L ... thanks for sorting out the affected server. Now have a frustration free shift. It's good to be back amongst friends and colleagues ID: 258657 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 258662 - Posted: 7 Mar 2006, 16:51:04 UTC - in response to Message 258657. Matt L ... thanks for sorting out the affected server. Now have a frustration free shift. Actually Jeff and Court (who tend to make it to the lab earlier than I do) dealt with it this morning. Internal disk going bad, needed to be fsck'ed, etc. I'm still at home eating cereal. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 258662 ·

John Clark Volunteer tester Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0	Message 258665 - Posted: 7 Mar 2006, 16:56:22 UTC - in response to Message 258662. Matt L ... thanks for sorting out the affected server. Now have a frustration free shift. Actually Jeff and Court (who tend to make it to the lab earlier than I do) dealt with it this morning. Internal disk going bad, needed to be fsck'ed, etc. I'm still at home eating cereal. - Matt Enjoy your cereal, and a quiet trip to the campus. No matter which team member sorted it, the result is pleasing and appreciated. You are, from my perspective, seen as the direct face of the team. It's good to be back amongst friends and colleagues ID: 258665 ·

[B@H] Ray Volunteer tester Send message Joined: 1 Sep 00 Posts: 485 Credit: 45,275 RAC: 0	Message 258676 - Posted: 7 Mar 2006, 17:24:48 UTC After getting work this morning I can't upload. Well I just turned the network access of so I don't get all those messages. Will turn it on later when things quiet down at Berkley. Still have 10 to do so not in a hurry here, would just go pending anyways. If you have a bunch to crunch why not do the same and not waiste the bandwidth. That will also allow the ones who really need the work a chance to get it also. The splitters will be buzy so we all can get it later when needed. ID: 258676 ·

Fuzzy Hollynoodles Volunteer tester Send message Joined: 3 Apr 99 Posts: 9659 Credit: 251,998 RAC: 0	Message 258687 - Posted: 7 Mar 2006, 17:37:16 UTC - in response to Message 258662. Matt L ... thanks for sorting out the affected server. Now have a frustration free shift. Actually Jeff and Court (who tend to make it to the lab earlier than I do) dealt with it this morning. Internal disk going bad, needed to be fsck'ed, etc. I'm still at home eating cereal. - Matt Yes, he does eat!!!! :-O ( ;-D ) "I'm trying to maintain a shred of dignity in this world." - Me ID: 258687 ·

Cansecur Volunteer tester Send message Joined: 7 Feb 01 Posts: 19 Credit: 261,496 RAC: 0	Message 258692 - Posted: 7 Mar 2006, 17:56:08 UTC Still having problems with contacting Seti. Here are my messages. 07/03/2006 11:44:33 AM\|SETI@home\|Started download of setiathome_4.18_windows_intelx86.exe 07/03/2006 11:44:33 AM\|SETI@home\|Started download of better_banner.jpg 07/03/2006 11:45:21 AM\|SETI@home\|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: error 500 07/03/2006 11:45:21 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.exe 07/03/2006 11:45:21 AM\|SETI@home\|Temporarily failed download of better_banner.jpg: error 500 07/03/2006 11:45:21 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file better_banner.jpg 07/03/2006 11:45:21 AM\|SETI@home\|Started download of setiathome_4.18_windows_intelx86.pdb 07/03/2006 11:45:21 AM\|SETI@home\|Started download of 13ap03aa.25218.1120.122132.1.51 07/03/2006 11:45:34 AM\|SETI@home\|Temporarily failed download of 13ap03aa.25218.1120.122132.1.51: error 500 07/03/2006 11:45:34 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file 13ap03aa.25218.1120.122132.1.51 07/03/2006 11:46:22 AM\|SETI@home\|Started download of setiathome_4.18_windows_intelx86.exe 07/03/2006 11:46:43 AM\|\|Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu] 07/03/2006 11:46:43 AM\|SETI@home\|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: system I/O 07/03/2006 11:46:43 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.exe 07/03/2006 11:46:43 AM\|SETI@home\|Started download of better_banner.jpg 07/03/2006 11:48:33 AM\|SETI@home\|Temporarily failed download of setiathome_4.18_windows_intelx86.pdb: error 500 07/03/2006 11:48:33 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.pdb 07/03/2006 11:48:33 AM\|SETI@home\|Started download of 13ap03aa.25218.1120.122132.1.51 07/03/2006 11:48:40 AM\|SETI@home\|Temporarily failed download of 13ap03aa.25218.1120.122132.1.51: error 500 07/03/2006 11:48:40 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file 13ap03aa.25218.1120.122132.1.51 07/03/2006 11:48:41 AM\|SETI@home\|Started download of setiathome_4.18_windows_intelx86.exe 07/03/2006 11:48:46 AM\|SETI@home\|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: error 500 07/03/2006 11:48:46 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.exe 07/03/2006 11:49:34 AM\|SETI@home\|Started download of setiathome_4.18_windows_intelx86.pdb 07/03/2006 11:49:55 AM\|SETI@home\|Temporarily failed download of better_banner.jpg: error 500 07/03/2006 11:49:55 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file better_banner.jpg 07/03/2006 11:49:55 AM\|SETI@home\|Started download of 13ap03aa.25218.1120.122132.1.51 07/03/2006 11:51:30 AM\|SETI@home\|Finished download of 13ap03aa.25218.1120.122132.1.51 07/03/2006 11:51:30 AM\|SETI@home\|Throughput 3855 bytes/sec 07/03/2006 11:51:31 AM\|SETI@home\|Started download of setiathome_4.18_windows_intelx86.exe 07/03/2006 11:51:54 AM\|\|Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu] 07/03/2006 11:51:54 AM\|SETI@home\|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: system I/O 07/03/2006 11:51:54 AM\|SETI@home\|Backing off 1 minutes and 20 seconds on download of file setiathome_4.18_windows_intelx86.exe 07/03/2006 11:51:54 AM\|SETI@home\|Started download of better_banner.jpg 07/03/2006 11:52:45 AM\|SETI@home\|Temporarily failed download of setiathome_4.18_windows_intelx86.pdb: error 500 07/03/2006 11:52:45 AM\|SETI@home\|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.pdb ID: 258692 ·

Miklos M. Send message Joined: 5 May 99 Posts: 955 Credit: 136,115,648 RAC: 73	Message 258708 - Posted: 7 Mar 2006, 19:14:16 UTC - in response to Message 258692. Same here, although a few units are trying to trickle in. Nick ID: 258708 ·

Elwood Send message Joined: 28 Jan 06 Posts: 35 Credit: 394,457 RAC: 0	Message 258714 - Posted: 7 Mar 2006, 19:31:35 UTC I'm suspending network activity until I run out of work. No sense in pinging the server unnecessarily while the techs work the bugs out. ID: 258714 ·

Jack Gulley Send message Joined: 4 Mar 03 Posts: 423 Credit: 526,566 RAC: 0	Message 258728 - Posted: 7 Mar 2006, 20:02:41 UTC The Internet paths are working now, its just the "normal" recovery problem of the Upload/Download server being overloaded and dropping requests. That should slowly clear after the surplus of Results Ready to Send goes back to zero again. ID: 258728 ·

KWSN - Sir Brian - err sorry - wrong film! Volunteer tester Send message Joined: 18 Feb 06 Posts: 11 Credit: 674,394 RAC: 0	Message 258733 - Posted: 7 Mar 2006, 20:24:28 UTC Hmmm still getting errors 07/03/2006 20:19:58\|\|Resuming network activity 07/03/2006 20:19:58\|SETI@home\|Started upload of 26my01aa.743.24066.922168.1.203_2_0 07/03/2006 20:19:58\|SETI@home\|Started download of 17jn01aa.11345.27345.148584.1.162 07/03/2006 20:20:19\|\|Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu] 07/03/2006 20:20:19\|\|Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu] 07/03/2006 20:20:19\|SETI@home\|Temporarily failed upload of 26my01aa.743.24066.922168.1.203_2_0: system I/O is this the"normal issue with the uplad/download servers"? I'm new to this so appologies fro the dumb question in advance if it is. PS. I've worked in production support at a big blue chip finance Co. I know what you guys are going through, hang on in there! ID: 258733 ·

Joseph Send message Joined: 9 Mar 01 Posts: 42 Credit: 4,191,922 RAC: 0	Message 258743 - Posted: 7 Mar 2006, 20:54:45 UTC All of my computers are unable to report results or download anything!!! ID: 258743 ·

Elwood Send message Joined: 28 Jan 06 Posts: 35 Credit: 394,457 RAC: 0	Message 258750 - Posted: 7 Mar 2006, 21:05:49 UTC Last modified: 7 Mar 2006, 21:06:56 UTC Is this the"normal issue with the uplad/download servers"?I'm new to this so appologies fro the dumb question in advance if it is. I've only been at it about 5 weeks, but they outages do appear to be pretty common for SETI. The project is pretty large in scope and they don't have anywhere near the funding required to purchase up-to-date equipment, so they're doing the best they can with what they have. I ran SETI exclusively few a couple of weeks before attaching to several more projects, which keeps the ol' machines working no matter what. Also, whenever there is an outage, planned or not, there is a recovery period where too many machines contact the SETI server too quickly to report results and get more work, which essentially amounts to a denial of service attack. I've found that changing preference to only contact the servers every .5 to 1.0 days really helps with a) getting a larger amount of work at a time so that I'm less effected by outages and b)giving SETI a break so that my machines don't keep pestering an overloaded server. ID: 258750 ·

Gareth Lock Send message Joined: 14 Aug 02 Posts: 358 Credit: 969,807 RAC: 0	Message 258844 - Posted: 7 Mar 2006, 22:58:24 UTC - in response to Message 258750. Last modified: 7 Mar 2006, 23:00:00 UTC Is this the"normal issue with the uplad/download servers"?I'm new to this so appologies fro the dumb question in advance if it is. I've only been at it about 5 weeks, but they outages do appear to be pretty common for SETI. The project is pretty large in scope and they don't have anywhere near the funding required to purchase up-to-date equipment, so they're doing the best they can with what they have. I ran SETI exclusively few a couple of weeks before attaching to several more projects, which keeps the ol' machines working no matter what. Also, whenever there is an outage, planned or not, there is a recovery period where too many machines contact the SETI server too quickly to report results and get more work, which essentially amounts to a denial of service attack. I've found that changing preference to only contact the servers every .5 to 1.0 days really helps with a) getting a larger amount of work at a time so that I'm less effected by outages and b)giving SETI a break so that my machines don't keep pestering an overloaded server. One of the major reasons for the recent glut of outages I think is the recent shutdown of SETI "Classic" and the huge move by the majority of these "Classic" users over to BOINC (The BIG push). This has, in turn, put the extra demand on the BOINC servers, which has lead to a longer time between the project going back up and users actually getting any work. What we have is BOINC users + CLASSIC converts =... Well a helluva lot more up/download requests being sent to the same hardware as was just dealing with the original BOINC users. Bottlenecks are bound to occur. Your likening this effect to a DoS is actually quite an accurate description of what is going on. ID: 258844 ·

Dali Send message Joined: 14 Jul 99 Posts: 1 Credit: 1,033,421 RAC: 0	Message 258853 - Posted: 7 Mar 2006, 23:12:12 UTC I know. I just setup a Dual 2.8 dual-core Xeon server with 4 gigs of ram and I'm so itching to blow this up some but can't get anything to download.. ;( ID: 258853 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21725 Credit: 7,508,002 RAC: 20	Message 258856 - Posted: 7 Mar 2006, 23:14:53 UTC - in response to Message 258853. Last modified: 7 Mar 2006, 23:20:57 UTC I know. I just setup a Dual 2.8 dual-core Xeon server with 4 gigs of ram and I'm so itching to blow this up some but can't get anything to download.. ;( Try it out with the BBC Climate Experiment or CPDN or one or more of the other projects until s@h bounces back. Happy crunchin', Martin Note: Existing Boinc users need only attach to http://bbc.cpdn.org/ You should not try downloading the BBC customised Boinc software. Instead, only attach so that you just get the project client. See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 258856 ·

Darth Dogbytes™ Volunteer tester Send message Joined: 30 Jul 03 Posts: 7512 Credit: 2,021,148 RAC: 0	Message 258874 - Posted: 7 Mar 2006, 23:43:45 UTC Last I looked, everything is back up. I can now up/download. Whoopee..... Account frozen... ID: 258874 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.