Long Outage Today

Message boards : News : Long Outage Today
Message board moderation

To post messages, you must log in.

AuthorMessage
Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 119
Credit: 40,367
RAC: 0
United States
Message 1989378 - Posted: 10 Apr 2019, 0:57:56 UTC

We had to recover the master database on oscar from a backup taken today on carolyn. Oscar is now back to being the master DB and carolyn is once again the replica DB. Things will be a bit slow as the database becomes resident in memory.
ID: 1989378 · Report as offensive     Reply Quote
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 13
Credit: 25,381,597
RAC: 69,399
United States
Message 1989385 - Posted: 10 Apr 2019, 1:13:37 UTC

Thanks for all your hard work!
ID: 1989385 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 2597
Credit: 87,801,374
RAC: 515,493
United States
Message 1989391 - Posted: 10 Apr 2019, 1:51:59 UTC - in response to Message 1989378.  

+1
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1989391 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 7652
Credit: 628,681,426
RAC: 1,426,450
United States
Message 1989404 - Posted: 10 Apr 2019, 2:59:32 UTC

+2
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1989404 · Report as offensive     Reply Quote
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 24728
Credit: 45,483,739
RAC: 27,460
United States
Message 1989417 - Posted: 10 Apr 2019, 3:53:01 UTC

Thanks very much for the message and thanks for the work.
ID: 1989417 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10955
Credit: 155,518,210
RAC: 83,227
Australia
Message 1989424 - Posted: 10 Apr 2019, 4:39:25 UTC

Thanks for the update.
Grant
Darwin NT
ID: 1989424 · Report as offensive     Reply Quote
Profile mr.mac52
Avatar

Send message
Joined: 18 Mar 03
Posts: 66
Credit: 241,089,337
RAC: 76,161
United States
Message 1989425 - Posted: 10 Apr 2019, 4:49:49 UTC - in response to Message 1989404.  

+3
ID: 1989425 · Report as offensive     Reply Quote
Profile Chris S Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41511
Credit: 41,943,773
RAC: 1,462
Message 1989432 - Posted: 10 Apr 2019, 6:54:29 UTC

Thanks Jeff.

Do we know why the recovery was necessary?
ID: 1989432 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9656
Credit: 64,491,000
RAC: 44,422
United Kingdom
Message 1989436 - Posted: 10 Apr 2019, 7:27:19 UTC - in response to Message 1989432.  

Thanks Jeff.

Do we know why the recovery was necessary?


I think the answer is in Eric's post here

https://setiathome.berkeley.edu/forum_thread.php?id=84096
ID: 1989436 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12828
Credit: 134,168,804
RAC: 41,785
United Kingdom
Message 1989437 - Posted: 10 Apr 2019, 7:29:26 UTC - in response to Message 1989432.  

Because of what Eric wrote in message 1988485
ID: 1989437 · Report as offensive     Reply Quote
Profile Chris S Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41511
Credit: 41,943,773
RAC: 1,462
Message 1989445 - Posted: 10 Apr 2019, 9:25:39 UTC

Thank you gentlemen.

This time the primary database machine crashed and hasn't automatically recovered.

But I am enquiring as to the reason/cause of the crash in the first place. These things don't happen without cause or reason.

Q1. Why did it crash?
Q2, Why didn't it automatically recover?
Q3. h/w or S/w problem?

We know we can fix it, as was done, that is what backups are for.

Q4. How can we reduce the risk of it happening again?
ID: 1989445 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 329
Credit: 1,794,611
RAC: 941
United States
Message 1989564 - Posted: 10 Apr 2019, 23:35:11 UTC - in response to Message 1989445.  



Q4. How can we reduce the risk of it happening again?


$$$$$

sorry I couldn't help myself from giving the obvious answer. I'm sure the seti people want the same thing... keeping the seti system up as much as possible with the $ and time constraints they have.

To the seti people who always give time to get things up as soon as possible even on weekends and holidays. Thank you.
ID: 1989564 · Report as offensive     Reply Quote
Profile Chris S Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41511
Credit: 41,943,773
RAC: 1,462
Message 1989605 - Posted: 11 Apr 2019, 5:47:55 UTC

oscar: Intel Server (2 x quad-core 2.4GHz Xeon, 96 GB RAM)

Yes I agree that $$$$$ is the perennial problem and probably always will be with basically a volunteer project. But how old is Oscar now? Its got quad core processors and just 96gb Ram. later servers are Hex core with considerably more Ram. Last time they tried to increase the Server ram they ended up frigging around trying to find matching pairs of memory sticks. I still think that the s/w database is overstretched but I am told otherwise by some that say they know better.

How much would it cost to

a) max out the memory with new matching sticks
b) replace Oscar with a more modern m/c

A once off special funding exercise with maybe another recognition badge could well do that. We need someone like Jeff Cobb to tell us what could be done within sensible limitations and the help of the GPU Users Group. Experience has shown that people don't give to a general fund, they want to be able to say, hey I helped to buy that kit.

If someone sets it up I'll pledge $250 for starters.
ID: 1989605 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10955
Credit: 155,518,210
RAC: 83,227
Australia
Message 1989610 - Posted: 11 Apr 2019, 6:21:10 UTC

The present main bottle neck is storage I/O limitations, but removing those limitations with an All Flash Array would then move the bottle neck to the servers themselves (splitters, purgers, transitioner etc).
Grant
Darwin NT
ID: 1989610 · Report as offensive     Reply Quote

Message boards : News : Long Outage Today


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.