Long Outage Today

Message boards : News : Long Outage Today
Message board moderation

To post messages, you must log in.

AuthorMessage
Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 120
Credit: 40,367
RAC: 0
United States
Message 1989378 - Posted: 10 Apr 2019, 0:57:56 UTC

We had to recover the master database on oscar from a backup taken today on carolyn. Oscar is now back to being the master DB and carolyn is once again the replica DB. Things will be a bit slow as the database becomes resident in memory.
ID: 1989378 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 14
Credit: 30,580,475
RAC: 53,632
United States
Message 1989385 - Posted: 10 Apr 2019, 1:13:37 UTC

Thanks for all your hard work!
ID: 1989385 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3203
Credit: 149,392,144
RAC: 765,211
United States
Message 1989391 - Posted: 10 Apr 2019, 1:51:59 UTC - in response to Message 1989378.  

+1
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1989391 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 8791
Credit: 773,592,277
RAC: 1,665,333
United States
Message 1989404 - Posted: 10 Apr 2019, 2:59:32 UTC

+2
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1989404 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 25263
Credit: 47,898,619
RAC: 26,247
United States
Message 1989417 - Posted: 10 Apr 2019, 3:53:01 UTC

Thanks very much for the message and thanks for the work.
ID: 1989417 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11331
Credit: 164,272,416
RAC: 102,266
Australia
Message 1989424 - Posted: 10 Apr 2019, 4:39:25 UTC

Thanks for the update.
Grant
Darwin NT
ID: 1989424 · Report as offensive
Profile mr.mac52
Avatar

Send message
Joined: 18 Mar 03
Posts: 67
Credit: 245,865,017
RAC: 6,552
United States
Message 1989425 - Posted: 10 Apr 2019, 4:49:49 UTC - in response to Message 1989404.  

+3
ID: 1989425 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41574
Credit: 41,951,956
RAC: 12
Message 1989432 - Posted: 10 Apr 2019, 6:54:29 UTC

Thanks Jeff.

Do we know why the recovery was necessary?
ID: 1989432 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9775
Credit: 72,012,144
RAC: 101,252
United Kingdom
Message 1989436 - Posted: 10 Apr 2019, 7:27:19 UTC - in response to Message 1989432.  

Thanks Jeff.

Do we know why the recovery was necessary?


I think the answer is in Eric's post here

https://setiathome.berkeley.edu/forum_thread.php?id=84096
ID: 1989436 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12988
Credit: 138,418,892
RAC: 56,050
United Kingdom
Message 1989437 - Posted: 10 Apr 2019, 7:29:26 UTC - in response to Message 1989432.  

Because of what Eric wrote in message 1988485
ID: 1989437 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41574
Credit: 41,951,956
RAC: 12
Message 1989445 - Posted: 10 Apr 2019, 9:25:39 UTC

Thank you gentlemen.

This time the primary database machine crashed and hasn't automatically recovered.

But I am enquiring as to the reason/cause of the crash in the first place. These things don't happen without cause or reason.

Q1. Why did it crash?
Q2, Why didn't it automatically recover?
Q3. h/w or S/w problem?

We know we can fix it, as was done, that is what backups are for.

Q4. How can we reduce the risk of it happening again?
ID: 1989445 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 444
Credit: 1,875,970
RAC: 926
United States
Message 1989564 - Posted: 10 Apr 2019, 23:35:11 UTC - in response to Message 1989445.  



Q4. How can we reduce the risk of it happening again?


$$$$$

sorry I couldn't help myself from giving the obvious answer. I'm sure the seti people want the same thing... keeping the seti system up as much as possible with the $ and time constraints they have.

To the seti people who always give time to get things up as soon as possible even on weekends and holidays. Thank you.
ID: 1989564 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41574
Credit: 41,951,956
RAC: 12
Message 1989605 - Posted: 11 Apr 2019, 5:47:55 UTC

oscar: Intel Server (2 x quad-core 2.4GHz Xeon, 96 GB RAM)

Yes I agree that $$$$$ is the perennial problem and probably always will be with basically a volunteer project. But how old is Oscar now? Its got quad core processors and just 96gb Ram. later servers are Hex core with considerably more Ram. Last time they tried to increase the Server ram they ended up frigging around trying to find matching pairs of memory sticks. I still think that the s/w database is overstretched but I am told otherwise by some that say they know better.

How much would it cost to

a) max out the memory with new matching sticks
b) replace Oscar with a more modern m/c

A once off special funding exercise with maybe another recognition badge could well do that. We need someone like Jeff Cobb to tell us what could be done within sensible limitations and the help of the GPU Users Group. Experience has shown that people don't give to a general fund, they want to be able to say, hey I helped to buy that kit.

If someone sets it up I'll pledge $250 for starters.
ID: 1989605 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11331
Credit: 164,272,416
RAC: 102,266
Australia
Message 1989610 - Posted: 11 Apr 2019, 6:21:10 UTC

The present main bottle neck is storage I/O limitations, but removing those limitations with an All Flash Array would then move the bottle neck to the servers themselves (splitters, purgers, transitioner etc).
Grant
Darwin NT
ID: 1989610 · Report as offensive

Message boards : News : Long Outage Today


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.