Panic Mode On (25) Server problems

Message boards : Number crunching : Panic Mode On (25) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Berserker
Volunteer tester

Send message
Joined: 2 Jun 99
Posts: 105
Credit: 5,440,087
RAC: 0
United Kingdom
Message 937770 - Posted: 5 Oct 2009, 20:18:07 UTC - in response to Message 937684.  

Eric said in an email that the problem couldn't be solved remotely and had to be rebooted manually due to a server crash (mork). That means someone kicked the servers at some point to get them back running.

And that's where a network-accessable UPS with a power cycle outlet is handy, but only if you have the server set to power on after AC loss.

... and if you ever think you're going to use it, be sure to test it thoroughly when you're in the same building.

It's a real bugger if you need to get through the router and a hub to turn the UPS back on, and the router and hub are plugged into that UPS.

Not that I've ever done anything remotely like that.


Even if you do have a remote reboot switch, it's no guarantee. Remotely reconfigure firewall. Hey, wait. Where did my server go? Not that I've done anything like that either.

Besides, power cycling a server that is in an unknown state isn't always the best idea.

PS - Welcome back SETI.
Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking.
ID: 937770 · Report as offensive
Name

Send message
Joined: 26 Sep 09
Posts: 12
Credit: 59,462
RAC: 0
United States
Message 937772 - Posted: 5 Oct 2009, 20:21:14 UTC

On the status page it sometimes shows that Jocelyn is running but is in an 'offline' state. When this is the case, does the replica database update itself or is it in stasis?
ID: 937772 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 937778 - Posted: 5 Oct 2009, 20:34:08 UTC - in response to Message 937772.  

On the status page it sometimes shows that Jocelyn is running but is in an 'offline' state. When this is the case, does the replica database update itself or is it in stasis?


If you mena the section on the right that shows "Replica seconds behind master Offline "

That is normaly when the replica isn't doing updates to the pages as the two servers are syncing up. That or it was jsut nor turned back on after whatever happened.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 937778 · Report as offensive
Profile Albireo380

Send message
Joined: 21 Mar 08
Posts: 119
Credit: 1,570,025
RAC: 0
United Kingdom
Message 937795 - Posted: 5 Oct 2009, 20:54:51 UTC

I don't pretend to know what is happening. But my backlog of completed work units has now been taken and dealt with + I have enough work to keep mw happy for the next day or so. Then I guess there are other apps to work with.

So .... well done SETI Team for keeping it all going, and best of luck in getting it fully up and running shortly.

Tom
ID: 937795 · Report as offensive
Ianab
Volunteer tester

Send message
Joined: 11 Jun 08
Posts: 732
Credit: 20,635,586
RAC: 5
New Zealand
Message 937799 - Posted: 5 Oct 2009, 21:02:13 UTC - in response to Message 937772.  

On the status page it sometimes shows that Jocelyn is running but is in an 'offline' state. When this is the case, does the replica database update itself or is it in stasis?


In todays tech news Matt indicated that the replica is out of sync and will probably stay offline untill the regular outage tomorrow. It can be set to update itself in real time, but that puts a heavy load on the main database, right when it's already being hammered catching up after the weekend crash. As it would probabbly take 24 hours to re-sync anway, it's better to just wait till the system is offline and then make a fresh copy of the whole database in one go.

Ian
ID: 937799 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 937834 - Posted: 5 Oct 2009, 23:08:32 UTC - in response to Message 937715.  
Last modified: 5 Oct 2009, 23:16:45 UTC

Hi, could not reach SETI at all, yesterday. Server status page showed only 4 splitters, which were Disabled or Not Running.
Even the SETI@home main page showed a DB error on 'Mork'. Did have some work, other projects filled the cache.

And seti.berkeley.edu site, was/is unreachable.
Now it is also running.
ID: 937834 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 937917 - Posted: 6 Oct 2009, 6:34:23 UTC
Last modified: 6 Oct 2009, 6:34:43 UTC

yesterday after Seti went up I kept on getting no new worj, however today I am getting cannot connect to server is this the same message as http errors or what?
ID: 937917 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 937921 - Posted: 6 Oct 2009, 7:10:14 UTC - in response to Message 937917.  

is this the same message as http errors or what?

Pretty much.
Network traffic is maxed out, and given the length of the outage i wouldn't expect things to settle down for at least 12 hours.

Grant
Darwin NT
ID: 937921 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 937928 - Posted: 6 Oct 2009, 9:11:11 UTC - in response to Message 937921.  

Thank you now back to no new work even though server status has it around 50,000 Wus ready to be sent out. Luckily doing WCG
ID: 937928 · Report as offensive
Profile AndyW Project Donor
Volunteer tester
Avatar

Send message
Joined: 23 Oct 02
Posts: 5862
Credit: 10,957,677
RAC: 18
United Kingdom
Message 937929 - Posted: 6 Oct 2009, 10:09:31 UTC

I can't collect any work either, despite work apparently being available?

I've had my quad sat idle for days - it was noticeably cooler in my office this morning!
ID: 937929 · Report as offensive
Profile 52 Aces
Avatar

Send message
Joined: 7 Jan 02
Posts: 497
Credit: 14,261,068
RAC: 67
United States
Message 937931 - Posted: 6 Oct 2009, 10:39:49 UTC - in response to Message 937929.  

I can't collect any work either, despite work apparently being available?


Over the course of 8 hours, I was able to grab enough work for 2 days, so I just placed my system on "Won't Get New Tasks" until things recover.

Keep in mind Tuesday maintenance begins shortly (when they'll also fresh copy Jocelyn DB) ... if you're unable to get anything aggressively in the next couple of hours, you'll be out of luck for a while.

ID: 937931 · Report as offensive
Profile champ
Volunteer tester
Avatar

Send message
Joined: 12 Mar 03
Posts: 3642
Credit: 1,489,147
RAC: 0
Germany
Message 937933 - Posted: 6 Oct 2009, 10:43:08 UTC - in response to Message 937929.  

I can't collect any work either, despite work apparently being available?

I've had my quad sat idle for days - it was noticeably cooler in my office this morning!



Crunch other Boinc project. Some of them are very useful.
ID: 937933 · Report as offensive
Rasputin
Volunteer tester

Send message
Joined: 13 Jun 02
Posts: 1764
Credit: 6,132,221
RAC: 0
Russia
Message 937935 - Posted: 6 Oct 2009, 11:05:00 UTC - in response to Message 937701.  

The status says all of the splitters are either disabled or not running and at the same time there's supposed to be a result creation rate of almost 25 per second.

One of them has to be wrong. No splitters equals zero creation rate.

Not quite. Reissues are newly created results too. And potentially the Transitioner could create many reissues after it's been unable to check what needs doing for an extended period. I doubt it's fast enough to do 25 reissues a second, though.
                                                               Joe


So the "results ready to send" Total on the status page are a combination of new split work and reissues.

Is the "Current result creation rate" both new split work and reissues combined into one number? Or is it just a total of the new split work?

Sorry if that's confusing. I can't think of a better way to word it at the moment.

Thanks
ID: 937935 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 937943 - Posted: 6 Oct 2009, 12:45:40 UTC

I came home this morning and found all my uploads gone and some work units on my P4 and i7. the mac got rid of all the uploads but i still have work on that machine. Seti and milkyway duke it out to see who runs on that machine so I never ran out of work at all on the Mac.
[/quote]

Old James
ID: 937943 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 937958 - Posted: 6 Oct 2009, 13:50:27 UTC

I've gotten at least 60 tasks since everything came back online yesterday. I wasn't anywhere close to running out of work on my main cruncher. My only remaining other cruncher was down to 50% on one shorty, and one more shorty left, and that was it.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 937958 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 937971 - Posted: 6 Oct 2009, 15:15:10 UTC - in response to Message 937935.  

...
Reissues are newly created results too. And potentially the Transitioner could create many reissues after it's been unable to check what needs doing for an extended period. I doubt it's fast enough to do 25 reissues a second, though.
                                                               Joe

So the "results ready to send" Total on the status page are a combination of new split work and reissues.

Is the "Current result creation rate" both new split work and reissues combined into one number? Or is it just a total of the new split work?

Sorry if that's confusing. I can't think of a better way to word it at the moment.

Thanks

Result creation is exactly adding a new RESULT to the database for whatever reason the Transitioner needs to. The "Current result creation rate" does count both initial replication and reissues, even when the splitters have run out of data or are disabled some small creation rate is shown indicating the reissues.
                                                            Joe
ID: 937971 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 937973 - Posted: 6 Oct 2009, 15:22:46 UTC
Last modified: 6 Oct 2009, 15:24:49 UTC

Even if things are brought totally back into synch and online after today's outage, I would expect it would be at least 24-36 hours to stabilize a bit, considering all of the downtime.
There are undoubtedly many crunchers looking for work that have not been able to get it yet due to tangled comms.
Fortunately, at least on my end, uploads seem to have hung in there, so maybe they will not contribute too much extra to the pipeline when things come back up this afternoon.

And my Boincstats are gonna get a heck of a kick when they do the first valid stats dump.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 937973 · Report as offensive
Rasputin
Volunteer tester

Send message
Joined: 13 Jun 02
Posts: 1764
Credit: 6,132,221
RAC: 0
Russia
Message 937978 - Posted: 6 Oct 2009, 15:40:51 UTC - in response to Message 937971.  

...
Reissues are newly created results too. And potentially the Transitioner could create many reissues after it's been unable to check what needs doing for an extended period. I doubt it's fast enough to do 25 reissues a second, though.
                                                               Joe

So the "results ready to send" Total on the status page are a combination of new split work and reissues.

Is the "Current result creation rate" both new split work and reissues combined into one number? Or is it just a total of the new split work?

Sorry if that's confusing. I can't think of a better way to word it at the moment.

Thanks

Result creation is exactly adding a new RESULT to the database for whatever reason the Transitioner needs to. The "Current result creation rate" does count both initial replication and reissues, even when the splitters have run out of data or are disabled some small creation rate is shown indicating the reissues.
                                                            Joe


Thanks Joe!
ID: 937978 · Report as offensive
Profile [AF>france>pas-de-calais]symaski62
Volunteer tester

Send message
Joined: 12 Aug 05
Posts: 258
Credit: 100,548
RAC: 0
France
Message 937980 - Posted: 6 Oct 2009, 15:52:33 UTC
Last modified: 6 Oct 2009, 15:53:22 UTC

06/10/2009 17:44:45 SETI@home Sending scheduler request: To fetch work.
06/10/2009 17:44:45 SETI@home Requesting new tasks for CPU
06/10/2009 17:45:08 Project communication failed: attempting access to reference site
06/10/2009 17:45:10 Internet access OK - project servers may be temporarily down.
06/10/2009 17:45:10 SETI@home Scheduler request failed: Couldn't connect to server


:) servers down
SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.
ID: 937980 · Report as offensive
Profile cliff west

Send message
Joined: 7 May 01
Posts: 211
Credit: 16,180,728
RAC: 15
United States
Message 937982 - Posted: 6 Oct 2009, 15:57:41 UTC

i think they maybe getting ready to due the normal Tuesday thing
ID: 937982 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (25) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.