Long Outage Today

Message boards : News : Long Outage Today
Message board moderation

To post messages, you must log in.

AuthorMessage
Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 121
Credit: 40,367
RAC: 0
United States
Message 1989378 - Posted: 10 Apr 2019, 0:57:56 UTC

We had to recover the master database on oscar from a backup taken today on carolyn. Oscar is now back to being the master DB and carolyn is once again the replica DB. Things will be a bit slow as the database becomes resident in memory.
ID: 1989378 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 15
Credit: 35,503,973
RAC: 54,678
United States
Message 1989385 - Posted: 10 Apr 2019, 1:13:37 UTC

Thanks for all your hard work!
ID: 1989385 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3575
Credit: 213,125,502
RAC: 505,690
United States
Message 1989391 - Posted: 10 Apr 2019, 1:51:59 UTC - in response to Message 1989378.  

+1
A proud member of the OFA (Old Farts Association)
"Over the hill? WHAT Hill? I don't REMEMBER any hill...." (from a bumper sticker I bought at a truck stop).
"If its Tourist Season why can't we shoot them?" (another bumper sticker)
ID: 1989391 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9904
Credit: 935,758,635
RAC: 1,502,439
United States
Message 1989404 - Posted: 10 Apr 2019, 2:59:32 UTC

+2
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1989404 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 25658
Credit: 50,071,371
RAC: 22,464
United States
Message 1989417 - Posted: 10 Apr 2019, 3:53:01 UTC

Thanks very much for the message and thanks for the work.
ID: 1989417 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11664
Credit: 174,465,178
RAC: 119,511
Australia
Message 1989424 - Posted: 10 Apr 2019, 4:39:25 UTC

Thanks for the update.
Grant
Darwin NT
ID: 1989424 · Report as offensive
Profile mr.mac52
Avatar

Send message
Joined: 18 Mar 03
Posts: 67
Credit: 245,882,461
RAC: 4
United States
Message 1989425 - Posted: 10 Apr 2019, 4:49:49 UTC - in response to Message 1989404.  

+3
ID: 1989425 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 464
Message 1989432 - Posted: 10 Apr 2019, 6:54:29 UTC

Thanks Jeff.

Do we know why the recovery was necessary?
ID: 1989432 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9863
Credit: 85,224,624
RAC: 67,269
United Kingdom
Message 1989436 - Posted: 10 Apr 2019, 7:27:19 UTC - in response to Message 1989432.  

Thanks Jeff.

Do we know why the recovery was necessary?


I think the answer is in Eric's post here

https://setiathome.berkeley.edu/forum_thread.php?id=84096
ID: 1989436 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13196
Credit: 154,814,480
RAC: 200,331
United Kingdom
Message 1989437 - Posted: 10 Apr 2019, 7:29:26 UTC - in response to Message 1989432.  

Because of what Eric wrote in message 1988485
ID: 1989437 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 464
Message 1989445 - Posted: 10 Apr 2019, 9:25:39 UTC

Thank you gentlemen.

This time the primary database machine crashed and hasn't automatically recovered.

But I am enquiring as to the reason/cause of the crash in the first place. These things don't happen without cause or reason.

Q1. Why did it crash?
Q2, Why didn't it automatically recover?
Q3. h/w or S/w problem?

We know we can fix it, as was done, that is what backups are for.

Q4. How can we reduce the risk of it happening again?
ID: 1989445 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 572
Credit: 1,953,601
RAC: 857
United States
Message 1989564 - Posted: 10 Apr 2019, 23:35:11 UTC - in response to Message 1989445.  



Q4. How can we reduce the risk of it happening again?


$$$$$

sorry I couldn't help myself from giving the obvious answer. I'm sure the seti people want the same thing... keeping the seti system up as much as possible with the $ and time constraints they have.

To the seti people who always give time to get things up as soon as possible even on weekends and holidays. Thank you.
ID: 1989564 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 464
Message 1989605 - Posted: 11 Apr 2019, 5:47:55 UTC

oscar: Intel Server (2 x quad-core 2.4GHz Xeon, 96 GB RAM)

Yes I agree that $$$$$ is the perennial problem and probably always will be with basically a volunteer project. But how old is Oscar now? Its got quad core processors and just 96gb Ram. later servers are Hex core with considerably more Ram. Last time they tried to increase the Server ram they ended up frigging around trying to find matching pairs of memory sticks. I still think that the s/w database is overstretched but I am told otherwise by some that say they know better.

How much would it cost to

a) max out the memory with new matching sticks
b) replace Oscar with a more modern m/c

A once off special funding exercise with maybe another recognition badge could well do that. We need someone like Jeff Cobb to tell us what could be done within sensible limitations and the help of the GPU Users Group. Experience has shown that people don't give to a general fund, they want to be able to say, hey I helped to buy that kit.

If someone sets it up I'll pledge $250 for starters.
ID: 1989605 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11664
Credit: 174,465,178
RAC: 119,511
Australia
Message 1989610 - Posted: 11 Apr 2019, 6:21:10 UTC

The present main bottle neck is storage I/O limitations, but removing those limitations with an All Flash Array would then move the bottle neck to the servers themselves (splitters, purgers, transitioner etc).
Grant
Darwin NT
ID: 1989610 · Report as offensive

Message boards : News : Long Outage Today


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.