Panic Mode On (46) Server problems

Message boards : Number crunching : Panic Mode On (46) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1096791 - Posted: 13 Apr 2011, 12:38:55 UTC - in response to Message 1096722.  
Last modified: 13 Apr 2011, 12:39:19 UTC

Problems with the replica database, ATM its 15,047 seconds behind the master.



That is rather normal while the replica catches up after maintenance. Six hours later and it is down to about 2500 seconds.
Sometimes you might even observe the time go up for a few hours after everything comes back online. IIRC the replica recovers in about 12-24 hours after everything is brought back up.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1096791 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1097047 - Posted: 14 Apr 2011, 6:50:01 UTC

Oh, meow.

Downloads seem to have died.

Uppies and reporting still working.


"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1097047 · Report as offensive
Profile Robert J
Avatar

Send message
Joined: 30 Mar 00
Posts: 115
Credit: 20,087,874
RAC: 15
United States
Message 1097058 - Posted: 14 Apr 2011, 8:13:33 UTC
Last modified: 14 Apr 2011, 8:14:23 UTC

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d
ID: 1097058 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1097064 - Posted: 14 Apr 2011, 9:22:48 UTC - in response to Message 1097058.  

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'
ID: 1097064 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1097068 - Posted: 14 Apr 2011, 9:46:51 UTC

Time to call Dyno Rod.
They'll be there soon after the team gets into work.

ID: 1097068 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1097070 - Posted: 14 Apr 2011, 10:02:51 UTC - in response to Message 1097064.  

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'


GRRR....and just as I got my second card back in from eVGA. Hopefully they ole' cache last one more test!....err that should be a period....
Traveling through space at ~67,000mph!
ID: 1097070 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1097072 - Posted: 14 Apr 2011, 10:14:59 UTC

One of my machines at home was mid downloading some tasks when everything went splat about 3:30 UTC. Then it looks like it cleared up about 5:00 UTC as the tasks finished downloading and a few more were requested and downloaded. Then again around 7:00 UTC tasks are no go for d/l.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1097072 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1097073 - Posted: 14 Apr 2011, 10:22:04 UTC

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1097073 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1097084 - Posted: 14 Apr 2011, 10:57:52 UTC - in response to Message 1097073.  

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1097084 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1097087 - Posted: 14 Apr 2011, 11:39:54 UTC - in response to Message 1097084.  

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

True, but that doesn't keep the pending downloads from retrying, filling my log up and making unnecessary connection attempts. Just easier to suspend comms. and wait for things to be fixed.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1097087 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1097088 - Posted: 14 Apr 2011, 11:41:31 UTC - in response to Message 1097084.  

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).
ID: 1097088 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1097092 - Posted: 14 Apr 2011, 12:04:08 UTC - in response to Message 1097088.  

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).


Isn't the backoff only 2 hours max in the 6.10 branch? iirc we went up to 24h max on 6.12
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1097092 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1097109 - Posted: 14 Apr 2011, 13:07:48 UTC - in response to Message 1097092.  

Yeah, I've got four APs that haven't started at all. Logs show HTTP errors starting at 0736UTC. Guess I'll suspend network comms. until it is fixed.

As upload/reporting is still working, for now :), you could set NNT to stop the inflow of work if it is going to be a few days to fix.

Any downloads you already have assigned will quieten down by 'project backoff', and keep the log reasonably clean. For the time being, uploads seem fine (except at Beta, which is stuck on uploads too, but accepting reports).

Isn't the backoff only 2 hours max in the 6.10 branch? iirc we went up to 24h max on 6.12

Even the basic 'per task' backoff and retry always went up to a possible limit of four hours (though randomised within that range). I don't remember 'project backoff' ever being shorter, but I'll try and find when it was introduced.
ID: 1097109 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1097120 - Posted: 14 Apr 2011, 14:03:02 UTC - in response to Message 1097064.  
Last modified: 14 Apr 2011, 14:36:32 UTC

Looks like the Cricket Graph has taken a nose dive.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d

Quoting from Lunatics:

The grapevine says 'Gowron failed a drive and is hung. It will probably be 3-4 days to re-sync the RAID array, so no work until then.'

Ooooohhh.
Those be sour grapes.

And I should add that I am only referring to the grapes themselves that are sour...LOL. Not any opinions about said grapes.
Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1097120 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1097130 - Posted: 14 Apr 2011, 15:18:58 UTC - in response to Message 1097120.  

The grapes that are growing on the grapevine, you mean? Yes, they are sour indeed. But don't shoot the messenger, please - only trying to help :-)
ID: 1097130 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1097147 - Posted: 14 Apr 2011, 16:40:19 UTC - in response to Message 1097130.  

The grapes that are growing on the grapevine, you mean? Yes, they are sour indeed. But don't shoot the messenger, please - only trying to help :-)

Of course, Richard.
Your messages here are always welcome...the grapes are not always...LOL.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1097147 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1097150 - Posted: 14 Apr 2011, 16:44:36 UTC

Whatever the problem is, the boyz are back in the lab and kicking things...
Server status shows almost everything down right now.

The kitties are sending them their best kitty juju.......

Meow-ommmmmm, meow-ommmmmm.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1097150 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1097158 - Posted: 14 Apr 2011, 17:52:24 UTC - in response to Message 1097157.  

Obviously not too bad. From S@H home page:

Storage Server Issues
The server that stores our workunits crashed last night, but is recovering now. We hope to be fully back on line in a couple hours. 14 Apr 2011 | 16:51:19 UTC


Ooohh...
I like that flavor of grapes much better!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1097158 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1097170 - Posted: 14 Apr 2011, 19:18:06 UTC - in response to Message 1097158.  
Last modified: 14 Apr 2011, 19:22:08 UTC

Obviously not too bad. From S@H home page:

Storage Server Issues
The server that stores our workunits crashed last night, but is recovering now. We hope to be fully back on line in a couple hours. 14 Apr 2011 | 16:51:19 UTC


Ooohh...
I like that flavor of grapes much better!

Looks like we are back online. Normal catch-up dropped connections seem to be in play.
Also the BOINC Stats images are fixed.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1097170 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1097418 - Posted: 15 Apr 2011, 9:01:33 UTC


9000 seconds backlog on replica balanced out in 3 hours.
Looks nice.




With each crime and every kindness we birth our future.
ID: 1097418 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (46) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.