Down ... again?!

Message boards : Number crunching : Down ... again?!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 258640 - Posted: 7 Mar 2006, 15:52:37 UTC - in response to Message 258552.  

Hmm. Looks like our mail server crashed, and all the machines in the network are hanging on that. Maybe they'll push through eventually, but we'll probably be hurtin' all night. I'm not going up to the lab to kick the server. I'm going to sleep.

- Matt


You guys sleep? ;)


And they eat too?! :-O

No wonder that they need two or more jobs to keep the ends together.

So please make a money donation to the project, so we can help them getting some less obsolete hardware to work with. That would be a good way to show our appreciation for their work for us also.


"I'm trying to maintain a shred of dignity in this world." - Me

ID: 258640 · Report as offensive
Profile Pilot
Avatar

Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 258642 - Posted: 7 Mar 2006, 15:53:25 UTC - in response to Message 258637.  

Hmm. Looks like our mail server crashed, and all the machines in the network are hanging on that. Maybe they'll push through eventually, but we'll probably be hurtin' all night. I'm not going up to the lab to kick the server. I'm going to sleep.

- Matt


If you got paid what the Qantas IT staff get paid to be called out, when on call, you'd be there as quick as your little feet could carry you AND then stay as LONG as you could. 8-D

I can't remember all the rates, but I think answering the phone was $500!!!!!!
Unfortunatly, I was on subcontract :-(
I only got double rates. 8-)

Do you thing I should ring Gill and ask for another job ;-)


Seems like they could benefit by going to MIT or someplace for a good CS class;) They don't seem to have benefited from anything taught locally.

When we finally figure it all out, all the rules will change and we can start all over again.
ID: 258642 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 258651 - Posted: 7 Mar 2006, 16:09:24 UTC
Last modified: 7 Mar 2006, 16:10:14 UTC

Look at the bright side. It's the first time in 4 days that the ready to send queue has had more than 30 results in it. Can't get to 'em, but by golly they're there. Maybe that was a 'scheduled mail server crash" to help get caught up. :)
ID: 258651 · Report as offensive
Profile [B@H] Ray
Volunteer tester
Avatar

Send message
Joined: 1 Sep 00
Posts: 485
Credit: 45,275
RAC: 0
United States
Message 258653 - Posted: 7 Mar 2006, 16:15:56 UTC - in response to Message 258640.  


And they eat too?! :-O



They do? Thought there was no time lift for that. Just computers, SETI and music.
ID: 258653 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 258657 - Posted: 7 Mar 2006, 16:25:08 UTC - in response to Message 258653.  
Last modified: 7 Mar 2006, 17:13:28 UTC


And they eat too?! :-O



They do? Thought there was no time lift for that. Just computers, SETI and music.



As you can see from the Cogent Link graphs -

http://fragment1.berkeley.edu/~cricket/inr-668-interfaces.html

- these are back to full bandwidth.

This means WUs are being distributed, but, as usual, there is a backlog to cruncher demands. The latter, as always, will take time to clear the demand.

I see from the Server Status page that the "WUs outstanding" numbers are closing in to the normal working level (circa 2.35 million)

So, given a bit of fair wind all will return to normal in a few hours.

Matt L ... thanks for sorting out the affected server. Now have a frustration free shift.
It's good to be back amongst friends and colleagues



ID: 258657 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 258662 - Posted: 7 Mar 2006, 16:51:04 UTC - in response to Message 258657.  

Matt L ... thanks for sorting out the affected server. Now have a frustration free shift.


Actually Jeff and Court (who tend to make it to the lab earlier than I do) dealt with it this morning. Internal disk going bad, needed to be fsck'ed, etc. I'm still at home eating cereal.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 258662 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 258665 - Posted: 7 Mar 2006, 16:56:22 UTC - in response to Message 258662.  

Matt L ... thanks for sorting out the affected server. Now have a frustration free shift.


Actually Jeff and Court (who tend to make it to the lab earlier than I do) dealt with it this morning. Internal disk going bad, needed to be fsck'ed, etc. I'm still at home eating cereal.

- Matt


Enjoy your cereal, and a quiet trip to the campus.

No matter which team member sorted it, the result is pleasing and appreciated. You are, from my perspective, seen as the direct face of the team.
It's good to be back amongst friends and colleagues



ID: 258665 · Report as offensive
Profile [B@H] Ray
Volunteer tester
Avatar

Send message
Joined: 1 Sep 00
Posts: 485
Credit: 45,275
RAC: 0
United States
Message 258676 - Posted: 7 Mar 2006, 17:24:48 UTC

After getting work this morning I can't upload. Well I just turned the network access of so I don't get all those messages. Will turn it on later when things quiet down at Berkley. Still have 10 to do so not in a hurry here, would just go pending anyways.

If you have a bunch to crunch why not do the same and not waiste the bandwidth. That will also allow the ones who really need the work a chance to get it also. The splitters will be buzy so we all can get it later when needed.
ID: 258676 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 258687 - Posted: 7 Mar 2006, 17:37:16 UTC - in response to Message 258662.  

Matt L ... thanks for sorting out the affected server. Now have a frustration free shift.


Actually Jeff and Court (who tend to make it to the lab earlier than I do) dealt with it this morning. Internal disk going bad, needed to be fsck'ed, etc. I'm still at home eating cereal.

- Matt


Yes, he does eat!!!! :-O ( ;-D )




"I'm trying to maintain a shred of dignity in this world." - Me

ID: 258687 · Report as offensive
Profile Cansecur
Volunteer tester
Avatar

Send message
Joined: 7 Feb 01
Posts: 19
Credit: 261,496
RAC: 0
Canada
Message 258692 - Posted: 7 Mar 2006, 17:56:08 UTC

Still having problems with contacting Seti. Here are my messages.
07/03/2006 11:44:33 AM|SETI@home|Started download of setiathome_4.18_windows_intelx86.exe
07/03/2006 11:44:33 AM|SETI@home|Started download of better_banner.jpg
07/03/2006 11:45:21 AM|SETI@home|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: error 500
07/03/2006 11:45:21 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.exe
07/03/2006 11:45:21 AM|SETI@home|Temporarily failed download of better_banner.jpg: error 500
07/03/2006 11:45:21 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file better_banner.jpg
07/03/2006 11:45:21 AM|SETI@home|Started download of setiathome_4.18_windows_intelx86.pdb
07/03/2006 11:45:21 AM|SETI@home|Started download of 13ap03aa.25218.1120.122132.1.51
07/03/2006 11:45:34 AM|SETI@home|Temporarily failed download of 13ap03aa.25218.1120.122132.1.51: error 500
07/03/2006 11:45:34 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file 13ap03aa.25218.1120.122132.1.51
07/03/2006 11:46:22 AM|SETI@home|Started download of setiathome_4.18_windows_intelx86.exe
07/03/2006 11:46:43 AM||Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu]
07/03/2006 11:46:43 AM|SETI@home|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: system I/O
07/03/2006 11:46:43 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.exe
07/03/2006 11:46:43 AM|SETI@home|Started download of better_banner.jpg
07/03/2006 11:48:33 AM|SETI@home|Temporarily failed download of setiathome_4.18_windows_intelx86.pdb: error 500
07/03/2006 11:48:33 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.pdb
07/03/2006 11:48:33 AM|SETI@home|Started download of 13ap03aa.25218.1120.122132.1.51
07/03/2006 11:48:40 AM|SETI@home|Temporarily failed download of 13ap03aa.25218.1120.122132.1.51: error 500
07/03/2006 11:48:40 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file 13ap03aa.25218.1120.122132.1.51
07/03/2006 11:48:41 AM|SETI@home|Started download of setiathome_4.18_windows_intelx86.exe
07/03/2006 11:48:46 AM|SETI@home|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: error 500
07/03/2006 11:48:46 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.exe
07/03/2006 11:49:34 AM|SETI@home|Started download of setiathome_4.18_windows_intelx86.pdb
07/03/2006 11:49:55 AM|SETI@home|Temporarily failed download of better_banner.jpg: error 500
07/03/2006 11:49:55 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file better_banner.jpg
07/03/2006 11:49:55 AM|SETI@home|Started download of 13ap03aa.25218.1120.122132.1.51
07/03/2006 11:51:30 AM|SETI@home|Finished download of 13ap03aa.25218.1120.122132.1.51
07/03/2006 11:51:30 AM|SETI@home|Throughput 3855 bytes/sec
07/03/2006 11:51:31 AM|SETI@home|Started download of setiathome_4.18_windows_intelx86.exe
07/03/2006 11:51:54 AM||Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu]
07/03/2006 11:51:54 AM|SETI@home|Temporarily failed download of setiathome_4.18_windows_intelx86.exe: system I/O
07/03/2006 11:51:54 AM|SETI@home|Backing off 1 minutes and 20 seconds on download of file setiathome_4.18_windows_intelx86.exe
07/03/2006 11:51:54 AM|SETI@home|Started download of better_banner.jpg
07/03/2006 11:52:45 AM|SETI@home|Temporarily failed download of setiathome_4.18_windows_intelx86.pdb: error 500
07/03/2006 11:52:45 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file setiathome_4.18_windows_intelx86.pdb


ID: 258692 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 258708 - Posted: 7 Mar 2006, 19:14:16 UTC - in response to Message 258692.  

Same here, although a few units are trying to trickle in.

Nick
ID: 258708 · Report as offensive
Profile Elwood

Send message
Joined: 28 Jan 06
Posts: 35
Credit: 394,457
RAC: 0
United States
Message 258714 - Posted: 7 Mar 2006, 19:31:35 UTC

I'm suspending network activity until I run out of work. No sense in pinging the server unnecessarily while the techs work the bugs out.
ID: 258714 · Report as offensive
Jack Gulley

Send message
Joined: 4 Mar 03
Posts: 423
Credit: 526,566
RAC: 0
United States
Message 258728 - Posted: 7 Mar 2006, 20:02:41 UTC

The Internet paths are working now,
its just the "normal" recovery problem of the
Upload/Download server being overloaded and
dropping requests. That should slowly clear
after the surplus of Results Ready to Send
goes back to zero again.
ID: 258728 · Report as offensive
Profile KWSN - Sir Brian - err sorry - wrong film!
Volunteer tester
Avatar

Send message
Joined: 18 Feb 06
Posts: 11
Credit: 674,394
RAC: 0
United Kingdom
Message 258733 - Posted: 7 Mar 2006, 20:24:28 UTC

Hmmm still getting errors

07/03/2006 20:19:58||Resuming network activity
07/03/2006 20:19:58|SETI@home|Started upload of 26my01aa.743.24066.922168.1.203_2_0
07/03/2006 20:19:58|SETI@home|Started download of 17jn01aa.11345.27345.148584.1.162
07/03/2006 20:20:19||Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu]
07/03/2006 20:20:19||Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu]
07/03/2006 20:20:19|SETI@home|Temporarily failed upload of 26my01aa.743.24066.922168.1.203_2_0: system I/O

is this the"normal issue with the uplad/download servers"?


I'm new to this so appologies fro the dumb question in advance if it is.

PS. I've worked in production support at a big blue chip finance Co. I know what you guys are going through, hang on in there!
ID: 258733 · Report as offensive
Joseph

Send message
Joined: 9 Mar 01
Posts: 42
Credit: 4,191,922
RAC: 0
United Arab Emirates
Message 258743 - Posted: 7 Mar 2006, 20:54:45 UTC

All of my computers are unable to report results or download anything!!!
ID: 258743 · Report as offensive
Profile Elwood

Send message
Joined: 28 Jan 06
Posts: 35
Credit: 394,457
RAC: 0
United States
Message 258750 - Posted: 7 Mar 2006, 21:05:49 UTC
Last modified: 7 Mar 2006, 21:06:56 UTC

Is this the"normal issue with the uplad/download servers"?I'm new to this so appologies fro the dumb question in advance if it is.


I've only been at it about 5 weeks, but they outages do appear to be pretty common for SETI. The project is pretty large in scope and they don't have anywhere near the funding required to purchase up-to-date equipment, so they're doing the best they can with what they have.

I ran SETI exclusively few a couple of weeks before attaching to several more projects, which keeps the ol' machines working no matter what.

Also, whenever there is an outage, planned or not, there is a recovery period where too many machines contact the SETI server too quickly to report results and get more work, which essentially amounts to a denial of service attack. I've found that changing preference to only contact the servers every .5 to 1.0 days really helps with a) getting a larger amount of work at a time so that I'm less effected by outages and b)giving SETI a break so that my machines don't keep pestering an overloaded server.
ID: 258750 · Report as offensive
Gareth Lock

Send message
Joined: 14 Aug 02
Posts: 358
Credit: 969,807
RAC: 0
United Kingdom
Message 258844 - Posted: 7 Mar 2006, 22:58:24 UTC - in response to Message 258750.  
Last modified: 7 Mar 2006, 23:00:00 UTC

Is this the"normal issue with the uplad/download servers"?I'm new to this so appologies fro the dumb question in advance if it is.


I've only been at it about 5 weeks, but they outages do appear to be pretty common for SETI. The project is pretty large in scope and they don't have anywhere near the funding required to purchase up-to-date equipment, so they're doing the best they can with what they have.

I ran SETI exclusively few a couple of weeks before attaching to several more projects, which keeps the ol' machines working no matter what.

Also, whenever there is an outage, planned or not, there is a recovery period where too many machines contact the SETI server too quickly to report results and get more work, which essentially amounts to a denial of service attack. I've found that changing preference to only contact the servers every .5 to 1.0 days really helps with a) getting a larger amount of work at a time so that I'm less effected by outages and b)giving SETI a break so that my machines don't keep pestering an overloaded server.


One of the major reasons for the recent glut of outages I think is the recent shutdown of SETI "Classic" and the huge move by the majority of these "Classic" users over to BOINC (The BIG push). This has, in turn, put the extra demand on the BOINC servers, which has lead to a longer time between the project going back up and users actually getting any work. What we have is BOINC users + CLASSIC converts =... Well a helluva lot more up/download requests being sent to the same hardware as was just dealing with the original BOINC users. Bottlenecks are bound to occur.

Your likening this effect to a DoS is actually quite an accurate description of what is going on.


ID: 258844 · Report as offensive
Profile Dali

Send message
Joined: 14 Jul 99
Posts: 1
Credit: 1,033,421
RAC: 0
United States
Message 258853 - Posted: 7 Mar 2006, 23:12:12 UTC

I know. I just setup a Dual 2.8 dual-core Xeon server with 4 gigs of ram and I'm so itching to blow this up some but can't get anything to download.. ;(

ID: 258853 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21725
Credit: 7,508,002
RAC: 20
United Kingdom
Message 258856 - Posted: 7 Mar 2006, 23:14:53 UTC - in response to Message 258853.  
Last modified: 7 Mar 2006, 23:20:57 UTC

I know. I just setup a Dual 2.8 dual-core Xeon server with 4 gigs of ram and I'm so itching to blow this up some but can't get anything to download.. ;(

Try it out with the BBC Climate Experiment or CPDN or one or more of the other projects until s@h bounces back.

Happy crunchin',
Martin


Note: Existing Boinc users need only attach to http://bbc.cpdn.org/ You should not try downloading the BBC customised Boinc software. Instead, only attach so that you just get the project client.
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 258856 · Report as offensive
Profile Darth Dogbytes™
Volunteer tester

Send message
Joined: 30 Jul 03
Posts: 7512
Credit: 2,021,148
RAC: 0
United States
Message 258874 - Posted: 7 Mar 2006, 23:43:45 UTC

Last I looked, everything is back up. I can now up/download. Whoopee.....
Account frozen...
ID: 258874 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Down ... again?!


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.