Panic Mode On (28) Server problems

Message boards : Number crunching : Panic Mode On (28) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next

AuthorMessage
Profile Spectrum
Avatar

Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 971178 - Posted: 18 Feb 2010, 13:47:47 UTC - in response to Message 971165.  

Same here in the land of OZ, 1 or 2 wu's upload but no luck updating, calm down and go roll in some catnip lol
ID: 971178 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971180 - Posted: 18 Feb 2010, 13:53:21 UTC - in response to Message 971178.  
Last modified: 18 Feb 2010, 13:58:11 UTC

Same here in the land of OZ, 1 or 2 wu's upload but no luck updating, calm down and go roll in some catnip lol

Catnip, hell......I wanna CRUNCH something.... LOL.
But uploading and reporting would do for now.

What I wanna know is just where the problem is....and is it gonna be addressed soon?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 971180 · Report as offensive
Labriskus

Send message
Joined: 20 Feb 07
Posts: 11
Credit: 745,920
RAC: 0
United Kingdom
Message 971183 - Posted: 18 Feb 2010, 14:00:28 UTC
Last modified: 18 Feb 2010, 14:01:42 UTC

Erhm not sure if anyone noticed the news page;

Projects are down due to a server closet air conditioning failure.
We have to power down most of our computers until this is fixed. 17 Feb 2010 2:36:55 UTC


http://setiathome.berkeley.edu/index.php


Apolagies if this has allready been pointed out but this could be whats going on :)
ID: 971183 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971184 - Posted: 18 Feb 2010, 14:04:20 UTC - in response to Message 971183.  

Erhm not sure if anyone noticed the news page;

Projects are down due to a server closet air conditioning failure.
We have to power down most of our computers until this is fixed. 17 Feb 2010 2:36:55 UTC


http://setiathome.berkeley.edu/index.php


Apolagies if this has allready been pointed out but this could be whats going on :)

My friend, if the AC had not been fixed, we would not be talking right now.......the servers would still be down.

There is a comms problem that existed before the AC failure, and still persists.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 971184 · Report as offensive
Labriskus

Send message
Joined: 20 Feb 07
Posts: 11
Credit: 745,920
RAC: 0
United Kingdom
Message 971185 - Posted: 18 Feb 2010, 14:11:42 UTC - in response to Message 971184.  

Erhm not sure if anyone noticed the news page;

Projects are down due to a server closet air conditioning failure.
We have to power down most of our computers until this is fixed. 17 Feb 2010 2:36:55 UTC


http://setiathome.berkeley.edu/index.php


Apolagies if this has allready been pointed out but this could be whats going on :)

My friend, if the AC had not been fixed, we would not be talking right now.......the servers would still be down.

There is a comms problem that existed before the AC failure, and still persists.



I see ........ then i stand corrected :)

Still its a nice chance to give the pc a clean :)
ID: 971185 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971186 - Posted: 18 Feb 2010, 14:13:16 UTC - in response to Message 971185.  

Erhm not sure if anyone noticed the news page;

Projects are down due to a server closet air conditioning failure.
We have to power down most of our computers until this is fixed. 17 Feb 2010 2:36:55 UTC


http://setiathome.berkeley.edu/index.php


Apolagies if this has allready been pointed out but this could be whats going on :)

My friend, if the AC had not been fixed, we would not be talking right now.......the servers would still be down.

There is a comms problem that existed before the AC failure, and still persists.



I see ........ then i stand corrected :)

Still its a nice chance to give the pc a clean :)

I suspect many dust bunnies are meeting their maker about now.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 971186 · Report as offensive
jangliss

Send message
Joined: 14 Jan 04
Posts: 1
Credit: 7,544
RAC: 0
United States
Message 971189 - Posted: 18 Feb 2010, 14:14:55 UTC - in response to Message 971184.  

My friend, if the AC had not been fixed, we would not be talking right now.......the servers would still be down.


You'd think they'd update the front page to reflect that the A/C is back up again.


There is a comms problem that existed before the AC failure, and still persists.


Weird that half the servers are still reporting "OK", and yet I've seen numerous complaints about not being able to report updates. All I keep getting is "rety in {some time}". I thought it might have been my connection, but tested from 3 different locations, still get it. Sniffing into the traffic, the server is throwing a 500 error when it reports the data. I'd be more than happy to post my traffic if it helps anybody resolve the issues.
ID: 971189 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971190 - Posted: 18 Feb 2010, 14:23:17 UTC - in response to Message 971189.  

My friend, if the AC had not been fixed, we would not be talking right now.......the servers would still be down.


You'd think they'd update the front page to reflect that the A/C is back up again.


There is a comms problem that existed before the AC failure, and still persists.


Weird that half the servers are still reporting "OK", and yet I've seen numerous complaints about not being able to report updates. All I keep getting is "rety in {some time}". I thought it might have been my connection, but tested from 3 different locations, still get it. Sniffing into the traffic, the server is throwing a 500 error when it reports the data. I'd be more than happy to post my traffic if it helps anybody resolve the issues.

That's the whole durned point.....
I have not had anybody confirm or deny whether this problem is within the Seti servers or is with their connection to the outside world.

I am gonna go to sleep right now, should I not freeze because the Cuda cards aren't throwing off much heat at the moment, and hope when I wake up it was all just a bad kitty dream.

WHERE ARE THE FISH???

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 971190 · Report as offensive
Matthew S. McCleary
Avatar

Send message
Joined: 9 Sep 99
Posts: 121
Credit: 2,288,242
RAC: 0
United States
Message 971198 - Posted: 18 Feb 2010, 15:03:04 UTC

It's situations such as this -- regardless of what the actual cause is -- that chase people away from crunching for SETI@home. Simply acknowledging that a problem exists and a solution is being looked for, whether the problem is Berkeley's or elsewhere, goes a long way towards calming everyone's nerves. We're not getting that, though, obviously.
ID: 971198 · Report as offensive
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 971200 - Posted: 18 Feb 2010, 15:03:54 UTC

With all due respect to Ned and Pappa, the Cricket Graphs don't lie. There has been a steady, overall reduction in throughput going back a week; well before the cooling went out in the closet. There are occasional upward spikes, to be sure, but the trend is obvious.

Two weeks ago, everything was chugging along just fine and this thread was practically dormant. We understand about weekly outages and emergencies like the A/C. But something else is clearly not right.

????






Join the PACK!
ID: 971200 · Report as offensive
Roundel

Send message
Joined: 1 Feb 06
Posts: 21
Credit: 6,850,211
RAC: 0
United States
Message 971201 - Posted: 18 Feb 2010, 15:04:49 UTC
Last modified: 18 Feb 2010, 15:06:28 UTC

Not sure if others have gone through since that 1 went through last night. But I'm now dry on a few machines and almost dry overall across the fleet Cant upload and now getting the errors that there are no jobs available on a dry machine.
Oh well, all the hardware can take a much needed rest.
ID: 971201 · Report as offensive
Roundel

Send message
Joined: 1 Feb 06
Posts: 21
Credit: 6,850,211
RAC: 0
United States
Message 971204 - Posted: 18 Feb 2010, 15:09:57 UTC

Well thats interesting, especially if you look at the monthly range. I hadn't noticed any connectivity problems until this whole situation arose at the beginning of the week. I wonder if a router or switch has been dying a slow death and finally gave up the ghost.
ID: 971204 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 971217 - Posted: 18 Feb 2010, 15:46:44 UTC - in response to Message 971200.  

Yes, over the month there is an obvious trend, but look at the yearly chart; the recent performance is in the noise! (But don't tell Matt or he may defer fixing the problem to work on other issues.)
ID: 971217 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 971218 - Posted: 18 Feb 2010, 15:47:37 UTC - in response to Message 971162.  

Monitoring my upload process, I see a very few making it through at present. What is frustrating is that I see a lot that get as far as 100% uploaded to be subsequently rejected and queued up to try again. The last bit of handshaking fails and causes the system to repeat work (upload) that appears to have been completed. This is not an new observation.

Because it obviously takes bandwidth and server resources to execute this type of failure, and because the behavior has been around 'forever', has any effort been made to remedy it?
ID: 971218 · Report as offensive
Dorphas
Avatar

Send message
Joined: 16 May 99
Posts: 118
Credit: 8,007,247
RAC: 0
United States
Message 971224 - Posted: 18 Feb 2010, 16:00:03 UTC
Last modified: 18 Feb 2010, 16:00:32 UTC

don't know what this may mean in the bigger picture, but i just had one machine upload about 50 workunits....but i can't get them to report at all.
ID: 971224 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 971226 - Posted: 18 Feb 2010, 16:01:42 UTC

My Rumor:

I think, the last great power outage in the Bay-Area had demaged the ISP-Hardware, and ISP had setup a 10 mbit-link for emergency use.

But this is really only my thought about the situation.

And whatever it really is, hope all can be fixed in near future (many UL waiting on my side ^^).


- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 971226 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 971236 - Posted: 18 Feb 2010, 16:25:45 UTC

The only way I can get any to upload is keep pressing buttons...This project backoff is for the birds, I would rather see them fix the problem than cripple the client. Some get thru but then the project wants to backoff for 2 hours like that is going to do anything but delay the problem.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 971236 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 971243 - Posted: 18 Feb 2010, 16:47:15 UTC - in response to Message 971236.  

The only way I can get any to upload is keep pressing buttons...This project backoff is for the birds, I would rather see them fix the problem than cripple the client. Some get thru but then the project wants to backoff for 2 hours like that is going to do anything but delay the problem.

I did a bit of button-pushing this morning, and got one machine down to one upload pending (it only had about a dozen in total, so I wasn't adding much to the load!).

Nothing on the reporting front, until it tried again of its own accord while I was on the phone at 15:24.

SETI@home	18/02/2010 15:24:41	Requesting 718981 seconds of new work, and reporting 10 completed tasks
SETI@home	18/02/2010 15:24:56	Scheduler RPC succeeded [server version 611]
SETI@home	18/02/2010 15:24:56	Message from server: (Project has no jobs available)

Says it all, really.
ID: 971243 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 971247 - Posted: 18 Feb 2010, 16:54:28 UTC

I know it makes us feel good - + I'm the same - but all this manual button-pushing remember does actually make things worse because it's putting more load on the server. The backoffs, though annoying, are there to spread the load throughout the thousands of clients out there.
ID: 971247 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 971248 - Posted: 18 Feb 2010, 16:55:50 UTC - in response to Message 971217.  

Yes, over the month there is an obvious trend, but look at the yearly chart; the recent performance is in the noise! (But don't tell Matt or he may defer fixing the problem to work on other issues.)

Up to and including Week 3 on the monthly chart, they were only splitting tapes for MultiBeam work - they were wrestling with major Astropulse database problems.

Astropulse splitting restarted during Week 4, and accounts for the higher average throughput since then (there hasn't been a regular supply of AP work since last May, and AP-crunchers' caches are drier than Death Valley). Every AP unit split gets gobbled up instantly. It's gone quiet again on the AP front now, because all loaded tapes have been split.

Other peaks and troughs relate to the variety in Angle Range for the MB work recently: if a recording was made during a high AR sky survey, the resulting WUs are processed (and hence downloaded) at four(-ish) times the rate of other ARs.

And the flatline since Monday is another story entirely.....
ID: 971248 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next

Message boards : Number crunching : Panic Mode On (28) Server problems


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.