We are now mostly recovered from a campus wide power outage.

Message boards : News : We are now mostly recovered from a campus wide power outage.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Jeff Cobb Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 122
Credit: 40,367
RAC: 0
United States
Message 1422770 - Posted: 1 Oct 2013, 21:48:07 UTC

Last night there was a power outage that affected the entire Berkeley campus. The data center, where our servers are located, does have facility wide UPS so all servers stayed up. Unfortunately, the data center air conditioning did not stay up. Machines were getting hot, so the data center staff had to bring them down.
ID: 1422770 · Report as offensive
Jeff Cobb Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 122
Credit: 40,367
RAC: 0
United States
Message 1422771 - Posted: 1 Oct 2013, 21:48:57 UTC

All of our servers seem to be OK with exception of marvin, the AstroPulse database server. Marvin appears to have been rendered nonfunctional, although we have reason to believe that the disks are OK. In any case, we have a backup of the database. We will be bringing marvin back to the lab tomorrow for a postmortem.
ID: 1422771 · Report as offensive
QSilver

Send message
Joined: 26 May 99
Posts: 232
Credit: 6,452,764
RAC: 0
United States
Message 1422777 - Posted: 1 Oct 2013, 21:57:23 UTC - in response to Message 1422771.  

Thanks for the updates, Jeff.
ID: 1422777 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1422793 - Posted: 1 Oct 2013, 22:25:04 UTC
Last modified: 1 Oct 2013, 22:32:55 UTC

Welcome back!

Ah, and you do know it'll only be temporarily? That Thursday California will rattle to a 9.7 Richter earthquake? (moves hands in a conjuring manner) ;-)
ID: 1422793 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1422840 - Posted: 2 Oct 2013, 0:23:12 UTC
Last modified: 2 Oct 2013, 0:28:12 UTC

Thanks for the info, Jeff. Is the effort to revive marvin the reason the outage was longer than it has usually been lately?

[edit]
Well, okay, I know you had to bring all the servers up, but since they were shut down properly (right?) that shouldn't have been too bad.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1422840 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1422859 - Posted: 2 Oct 2013, 1:04:44 UTC - in response to Message 1422793.  
Last modified: 2 Oct 2013, 1:11:40 UTC

Welcome back!

Ah, and you do know it'll only be temporarily? That Thursday California will rattle to a 9.7 Richter earthquake? (moves hands in a conjuring manner) ;-)

My Wiccan friends on Mt. Tam say "Nay, nay, moosebreath".
But I have both my cars gassed up and ready to head up the hill if need be.
Donald
Infernal Optimist / Submariner, retired
ID: 1422859 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1422965 - Posted: 2 Oct 2013, 5:56:04 UTC - in response to Message 1422770.  

Thanks for the heads-up Jeff ! :)
Glad to see everyone !
From France about this out(r)age >> http://www.meltycampus.fr/etats-unis-explosion-sur-le-campus-de-berkeley-video-a215177.html :(
ID: 1422965 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22506
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1423099 - Posted: 2 Oct 2013, 15:44:05 UTC - in response to Message 1422771.  

All of our servers seem to be OK with exception of marvin, the AstroPulse database server. Marvin appears to have been rendered nonfunctional, although we have reason to believe that the disks are OK. In any case, we have a backup of the database. We will be bringing marvin back to the lab tomorrow for a postmortem.



Thanks for the news, and your efforts.

Your description of performing a post-mortem brings all sorts of strange images to mind involving, yourself, Eric, Matt and surgical garb....
I hope it goes well, and doesn't turn out to be a post-mortem, but a successful surgical intervention leading to a complete recovery by the patient.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1423099 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3804
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1423131 - Posted: 2 Oct 2013, 16:42:51 UTC - in response to Message 1422770.  
Last modified: 2 Oct 2013, 16:46:56 UTC

Last night there was a power outage that affected the entire Berkeley campus.


Considering that both NASA and the NSF are down right now due to the Fed shutdown, this outage was more than a little disturbing at the time, and I'm glad it was only a coincidence!

Hope the NSF downtime doesn't impact this project.

All of our servers seem to be OK with exception of marvin, the AstroPulse database server.


And to add to the coincidence, there's the NSF-funded part of the project, too.
ID: 1423131 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1423146 - Posted: 2 Oct 2013, 17:23:38 UTC

Thanks for keeping us up to date Jeff ! Best of luck bringing Marvin back from the after life...

Hope to hear good news soon !
ID: 1423146 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1423161 - Posted: 2 Oct 2013, 17:33:19 UTC
Last modified: 2 Oct 2013, 17:34:21 UTC

Good luck with Marvin, but........
I gotta ask.
Isn't this the kind of thing that the colo was supposed to prevent?

I thought along with better power conditioning, better cooling, and better bandwidth, the other part of the puzzle was 24/7 babysitting.
Seems they failed in that last regard, and let temps get too high before they started to shut servers down. They surely knew that although the UPS systems would keep all the servers going, the AC was down.

In all other regards, the move to the colo has indeed been a very, very good thing for the project.

I am just wondering why things went astray in this particular crisis situation. It's what they are paid to protect, isn't it?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1423161 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1423176 - Posted: 2 Oct 2013, 17:58:57 UTC

Are the problems with Marvin preventing Beta from coming up? If not, could someone please start it???
.

Hello, from Albany, CA!...
ID: 1423176 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1423237 - Posted: 2 Oct 2013, 19:54:04 UTC - in response to Message 1423180.  

Are the problems with Marvin preventing Beta from coming up? If not, could someone please start it???


Beta has been up since the outage ended. In fact, Beta came back up before Main.


Not if you look at Beta's "Server Status" page...
.

Hello, from Albany, CA!...
ID: 1423237 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1423246 - Posted: 2 Oct 2013, 20:16:04 UTC - in response to Message 1423237.  
Last modified: 2 Oct 2013, 20:25:52 UTC

Are the problems with Marvin preventing Beta from coming up? If not, could someone please start it???


Beta has been up since the outage ended. In fact, Beta came back up before Main.


Not if you look at Beta's "Server Status" page...

Seti Beta's Server Status page has looked like that for years, the only entries that ever show as green are:

data-driven web pages
vote_monitor
splitter_throttle

Sometimes a splitter will show as green, but since Seti Beta is small scale not a lot of Wu's need to be split, so mostly stay orange or red,
the rest don't point to the actual process being run, even with beta_validate_v7 showing red, v7 Wu still validate there:

server status

The last conversation I had with Eric about the Server Status page. Beta is running on Bruno. The scripts that control the Deamons for the various services were conflicting with Seti Main. By disabling the reporting protion of the scripts they could run the Service Deamons, the Beta status page would not reflect what is or is not running.

I will poke an email in Eric's direction Monday.

Regards


Claggy
ID: 1423246 · Report as offensive
Profile Cornhusker

Send message
Joined: 20 Apr 09
Posts: 41
Credit: 45,415,265
RAC: 37
United States
Message 1423885 - Posted: 4 Oct 2013, 3:53:41 UTC

I still think you guys need to run an extension cord out here to Nebraska! :)
ID: 1423885 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1424152 - Posted: 4 Oct 2013, 19:54:26 UTC
Last modified: 4 Oct 2013, 19:54:56 UTC

I would have expected that by now, they would have the AP science database up and running on some other machine, if marvin is beyond repair.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1424152 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36619
Credit: 261,360,520
RAC: 489
Australia
Message 1424273 - Posted: 5 Oct 2013, 0:30:39 UTC

Well it's showing as running again and likely performing a catch up before turning the other associated functions on.

Cheers.
ID: 1424273 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36619
Credit: 261,360,520
RAC: 489
Australia
Message 1424292 - Posted: 5 Oct 2013, 0:53:33 UTC

AP's are now validating.

Cheers.
ID: 1424292 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1426103 - Posted: 9 Oct 2013, 13:54:12 UTC

Can we get a report on what was found to be wrong with marvin and what was done about it?

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1426103 · Report as offensive
rspolo
Volunteer tester

Send message
Joined: 11 Oct 99
Posts: 3
Credit: 7,697,331
RAC: 0
United States
Message 1427064 - Posted: 11 Oct 2013, 13:38:28 UTC

I have been supporting SETI for more than 15 years and have seen this type of issue more than once. Burkley is supposed to have smart people running the show..What engineer in their right mind would specify a UPS and or Gen set that could not handle environmental support along with the servers.. Lights and temp are just as important as the servers unless it is only designed to handle power while the servers shut down automatically.
ID: 1427064 · Report as offensive
1 · 2 · 3 · Next

Message boards : News : We are now mostly recovered from a campus wide power outage.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.