Power Outage mayhem: Feb 24/05

Message boards : Number crunching : Power Outage mayhem: Feb 24/05
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Cochise
Avatar

Send message
Joined: 3 Apr 99
Posts: 62
Credit: 3,079
RAC: 0
United States
Message 82370 - Posted: 26 Feb 2005, 2:02:49 UTC
Last modified: 26 Feb 2005, 2:03:28 UTC

Put yer $$$ where yer mouth is ;-)

Put yer $$$ where yer Mouth is


<img src="http://www.boincstats.com/stats/banner.php?cpid=b3c0c2639ea110901bd0970a1c22efcd">
ID: 82370 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82396 - Posted: 26 Feb 2005, 3:21:15 UTC - in response to Message 82360.  
Last modified: 26 Feb 2005, 3:23:43 UTC

> > The BOINC system is designed so that 100% uptime at the servers is not a
> > requirement.
>
> Down time is not the issue. Lost data is.

No actual "data" was lost, not as far as the Search for Extra-Terrestrial Intelligence is concerned.

The lost results will be re-crunched. It's lost time, it's lost work, but it isn't really lost data.
ID: 82396 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 82415 - Posted: 26 Feb 2005, 4:13:56 UTC

I just gota ask..

As the "main DB server" wasn't on the UPS and got hosed, was it close enough that a 100' extention cord could have provided power from the UPS for an outage like this? (to shut it down gracefully)

It just seems that for a box as important as the the main DB, running without a UPS could have been avoided by someone just bringing in an extention cord for a few days or until the server cabinet was reorged. (surely someone in the lab has a 100' extention cord that wasn't needed for a couple weeks..Hey - worst case, $10 that someone needed anyway!)

The DB lost "30 minutes" of it's data, but in that 30 minutes, it could have lost chrunching done on 5000 WU at 4 hours each!

Yes, the science will still works, but the reality is that 2 years of CP time could result in nothing more than "providing heat to the room", as the lost work will just need to be re-crunched...
ID: 82415 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82422 - Posted: 26 Feb 2005, 4:36:52 UTC - in response to Message 82415.  


> Yes, the science will still works, but the reality is that 2 years of CP time
> could result in nothing more than "providing heat to the room", as the lost
> work will just need to be re-crunched...

When you take those 5000 work units at four hours each, and divide them by amount of processing available (63,992 hosts, according to BOINC Synergy), it's still just a half-hour of clock time.

Put another way, it is 0.005% wasted over the course of a year.
ID: 82422 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 82428 - Posted: 26 Feb 2005, 5:02:21 UTC

> As the "main DB server" wasn't on the UPS and got hosed, was it close enough
> that a 100' extention cord could have provided power from the UPS for an outage
> like this? (to shut it down gracefully)

Yes and no. Close enough yes. Room on the UPSes? No. So to get the main DB server on UPS would require buying another UPS strictly for the few weeks the main DB server would be in another lab.

Should we have put the main DB server on UPS? Yes and no. On hindsight, well duh. File that under "coulda, woulda, shoulda." But given the situation before the outage? What a waste of time/money. We have a replica database that is on UPS. It doesn't keep up all the time (hence the 30 minute offset), but it would catch up most of the time and was a good "hot backup" in the interim. On top of that we back everything up to tape every week. Anyway, this was all a situation brought out of immediate necessity (the replica machine couldn't keep up by itself, so we had to prematurely force the new machine to be the master), so carefully laid out plans had to be revised on the spot.

Of course, more funds are always welcome. Having an extra few hundred dollars a week ago wouldn't have turned into a UPS on the master db, though. See above.

- Matt



-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 82428 · Report as offensive
Profile PT

Send message
Joined: 19 May 99
Posts: 231
Credit: 902,910
RAC: 0
United Kingdom
Message 82430 - Posted: 26 Feb 2005, 5:09:37 UTC - in response to Message 82243.  

> Well, looks like the guys worked their magic and got us back up and running.
>
> Thank you !!!!
>
>
> :)
>
Yup, it looks like they've done it again. Well done!
Happy crunching
ID: 82430 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 82431 - Posted: 26 Feb 2005, 5:09:53 UTC - in response to Message 82422.  


> Put another way, it is 0.005% wasted over the course of a year.
>
or ~ 2% wasted in a day (with 100k systems, that's 2000 computer's just "heating the room" for a day)

It's not "nothing" for the people that spent 20,000 hr's of computer time that got tossed for a $10 extention cord!

Jeeze... It seems that to you the project can do no wrong! It's not only "the science", but also the time that people volenteer, because without that, the main DB server would just be heating the room.....

To run the main DB server without a UPS (one of the most important systems), when the solution would be for someone to lend an extention cord, was downright foolish! Especially, given the problems over the last week! Heck, someone has now brought in a UPS from home, and if this was done a few days back would have saved 20000 hours of "reprocessing"!

While the Seti folks are not professional IT, you'd think a UPS on the most important box kind of made sense to them.....
ID: 82431 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 82434 - Posted: 26 Feb 2005, 5:12:42 UTC - in response to Message 82430.  

> > Well, looks like the guys worked their magic and got us back up and
> running.
> >
> > Thank you !!!!
> >
> >
> > :)
> >
> Yup, it looks like they've done it again. Well done!
>

They shoot themselves in the foot and you folks say "nice bandage! Good Work! Well Done!"?
ID: 82434 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 82448 - Posted: 26 Feb 2005, 5:34:32 UTC - in response to Message 82428.  
Last modified: 26 Feb 2005, 5:36:11 UTC

> > As the "main DB server" wasn't on the UPS and got hosed, was it close
> enough
> > that a 100' extention cord could have provided power from the UPS for an
> outage
> > like this? (to shut it down gracefully)
>
> Yes and no. Close enough yes. Room on the UPSes? No.

Ace HW.. a multi-outlet adapter ($1.99)(single plug so it wouldn't impact other sockets) or just unplug something that didn't need to be on UPS!

> So to get the main DB
> server on UPS would require buying another UPS strictly for the few weeks the
> main DB server would be in another lab.

Or getting the guy to bring in a UPS from home, as he's now done.. A sixpack of beer might have made this happen before the problem!


>
> Should we have put the main DB server on UPS? Yes and no. On hindsight, well
> duh. File that under "coulda, woulda, shoulda."

It's a no brainer... Wasn't there another box that 1) wasn't as critical, 2) if it got trashed could be rebuilt quickly, 3) could handle a power fail a bit easier (windows). Worst case, a $10 extention cord and a $2 multi-head adapter (probably no cost at all it someone asked "got one we can borrow for a couple weeks?")

> But given the situation before
> the outage? What a waste of time/money. We have a replica database that is on
> UPS. It doesn't keep up all the time (hence the 30 minute offset), but it
> would catch up most of the time and was a good "hot backup" in the interim.

It would be in sync if all was ok... Well, when alls not ok, the 30min old replica is well, 30 mins old! (20 THOUSAND hours of crunching lost!).

Like I said, for $20, the primary DB could have been on the UPS, and that's if no one had an extention cord they could loan to the lab for a week or two..


> On
> top of that we back everything up to tape every week. Anyway, this was all a
> situation brought out of immediate necessity (the replica machine couldn't
> keep up by itself, so we had to prematurely force the new machine to be the
> master), so carefully laid out plans had to be revised on the spot.


And you expect to handle the Seti Classic load this year?


>
> Of course, more funds are always welcome. Having an extra few hundred dollars
> a week ago wouldn't have turned into a UPS on the master db, though. See
> above.
>

Post a mailing address, and I will send you a 100' extention cord with an adapter so you can plug in two devices to a single UPS outlet..... You can write "azwoody" on the cord every few feet, and people can feel free to step on it when the mood fits!

> - Matt
>
>
>
>
ID: 82448 · Report as offensive
Profile PT

Send message
Joined: 19 May 99
Posts: 231
Credit: 902,910
RAC: 0
United Kingdom
Message 82452 - Posted: 26 Feb 2005, 5:36:39 UTC - in response to Message 82431.  

Wow, you really got out hard on the guys.
As a professional I do agree on some of the things you're writing. Lacks of UPSes are always a bad thing - and should by law be forbidden. ;-)

But if there are no UPSes available? Since this is funded purely on contribution I can somewhat have an understanding for lacking resources. That's why I don't agree with banning the guys! I don't think we have to teach them what to do. They played a high game and they lost. I think that’s punishing enough.

Yes, this punishment pores over on all volunteers all around the world but do remember that we are just volunteers and know very well what can happen during such project like SETI@Home. This is a calculated risk we all take!

So I stick to me earlier post on this message board. Guys - well done! You brought it back online....again! ;-D

Happy crunching!



Happy crunching
ID: 82452 · Report as offensive
Profile LEX LETHAL
Avatar

Send message
Joined: 3 Apr 99
Posts: 22
Credit: 423
RAC: 0
United States
Message 82465 - Posted: 26 Feb 2005, 5:54:09 UTC

I'm glad SETI's back up and healthy.

While things were down, I took a gander over to SETI Classic. After my visit, I thought about how we are all here for a reason, and that reason is to work together, to do volunteer work that makes us happy. It's weird, because I was thinking about how one day this will be the only SETI. Not the most important, but the most recent. At some future date when SETI changes again and I am destined to migrate, so be it. To be a part of something bigger than myself is a privledge.

As SETI grows and matures, we also grow and mature as a community.

LEX
ID: 82465 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 82481 - Posted: 26 Feb 2005, 6:30:46 UTC - in response to Message 82452.  


> But if there are no UPSes available? Since this is funded purely on
> contribution I can somewhat have an understanding for lacking resources.

They have funding from NSF - the "National Science Foundation" in the US (like $500000. It's just a question of how the funding is spent! I can post a link, if you are interested!

Also, all it took was an extention cord that could have been loaned, or for someone to bring in a UPS from home, as is the current solution! That's why I don't agree with banning the guys! I don't think we have to teach
> them what to do. They played a high game and they lost. I think that’s
> punishing enough.

They've been losing the game for the better part of a year! Were you around during the "Snap Appliance" days? It's one move like this after another!

>
> Yes, this punishment pores over on all volunteers all around the world but do
> remember that we are just volunteers and know very well what can happen during
> such project like SETI@Home. This is a calculated risk we all take!

But it's been like this since LAST JUNE! There used to be a joke that the servers were down every weekend, and it was true! This is NOT the first time that work got tossed, by any means!
>

ID: 82481 · Report as offensive
Profile KWSN - MajorKong
Volunteer tester
Avatar

Send message
Joined: 5 Jan 00
Posts: 2892
Credit: 1,499,890
RAC: 0
United States
Message 82483 - Posted: 26 Feb 2005, 6:37:37 UTC
Last modified: 26 Feb 2005, 6:40:02 UTC

Jeez, woody... not this s##t again...

Every time there is an outage at berkeley, you just HAVE to crow over it. Please stop.

ID: 82483 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 82485 - Posted: 26 Feb 2005, 6:40:02 UTC - in response to Message 82428.  

> > As the "main DB server" wasn't on the UPS and got hosed, was it close
> enough
> > that a 100' extention cord could have provided power from the UPS for an
> outage
> > like this? (to shut it down gracefully)
>
> Yes and no. Close enough yes. Room on the UPSes? No. So to get the main DB
> server on UPS would require buying another UPS strictly for the few weeks the
> main DB server would be in another lab.
>
> Should we have put the main DB server on UPS? Yes and no. On hindsight, well
> duh. File that under "coulda, woulda, shoulda." But given the situation before
> the outage? What a waste of time/money. We have a replica database that is on
> UPS. It doesn't keep up all the time (hence the 30 minute offset), but it
> would catch up most of the time and was a good "hot backup" in the interim. On
> top of that we back everything up to tape every week. Anyway, this was all a
> situation brought out of immediate necessity (the replica machine couldn't
> keep up by itself, so we had to prematurely force the new machine to be the
> master), so carefully laid out plans had to be revised on the spot.
>
> Of course, more funds are always welcome. Having an extra few hundred dollars
> a week ago wouldn't have turned into a UPS on the master db, though. See
> above.
>
> - Matt
>
>
>
Maybe we could get a fund going to send azwoody out there to stragten you all out,,, After all he knows everything,,, Oh wait scratch that...Woody does know everything so he must be rich! and can fly out there on his private jet! his design of course,,, Couldn't help it,,,,,

Smart ass Timmy




ID: 82485 · Report as offensive
Profile Toby
Volunteer tester
Avatar

Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 6,366,949
RAC: 0
United States
Message 82487 - Posted: 26 Feb 2005, 6:50:41 UTC

You have got to be kidding me! you want to run a power hungry, vital system off of a 100' extension cord?? That is just asking for trouble. How many hallways does it have to go through? How many doorways? How many opportunities for people to trip over it, causing not only a power outage but possibly physical damage to the server and/or UPS? What if facilities decides to clean the carpet? The machines could damage the cord causing a short circuit which, once again, could do temendous damage to the system, beyond a trashed database. Running a large system on a 100' extension cord is completely out of the question.

And for crying out loud, quit your whining and moaning over 30 minutes of data. You are worse than my 1 year old niece who is teething! Just be glad we are only missing 30 minutes instead of a week (if they would have had to restore from tapes). Do you ALWAYS have to look at the worst side of things? A 2% loss of data on a single day is pretty minimal really. Get over yourself.
A member of The Knights Who Say NI!
For rankings, history graphs and more, check out:
My BOINC stats site
ID: 82487 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82488 - Posted: 26 Feb 2005, 6:55:32 UTC - in response to Message 82431.  


> Jeeze... It seems that to you the project can do no wrong!

... and it seems that to you the project can do no right.

Nothing justifies the kind of abuse you're dishing out, Woody.
ID: 82488 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 82491 - Posted: 26 Feb 2005, 7:10:36 UTC - in response to Message 82487.  

> You have got to be kidding me! you want to run a power hungry, vital
> system off of a 100' extension cord??

The new database server is a Sun Fire V40z. According to Sun, a fully loaded V40z draws 760w. Specifications here.

This one isn't fully loaded, but it's still going to draw some power.
ID: 82491 · Report as offensive
Profile PT

Send message
Joined: 19 May 99
Posts: 231
Credit: 902,910
RAC: 0
United Kingdom
Message 82492 - Posted: 26 Feb 2005, 7:11:07 UTC - in response to Message 82481.  
Last modified: 26 Feb 2005, 7:48:14 UTC

> But it's been like this since LAST JUNE! There used to be a joke that the
> servers were down every weekend, and it was true! This is NOT the first time
> that work got tossed, by any means!
>
I honestly think you aggressiveness’ is out of scope.
I’ve been around for a long time and yes I do get fed up as well at times and I also lost WUs many times. I can still trace back to June to see my pending credits. But even if you or I wailing it’ll not change the situation, will it! Screaming and asking people why didn’t you this and that is not very constructive especially when you’re not in the project team. If you were you wouldn’t be screaming in this forum!

I will not take a discussion about their funding since I do not have any details and am because of that not able to make a judgment. And I honestly don’t think that’s my business to judge. I am here doing this crunching as a volunteer and if I don’t like the situation I have the options and crunch somewhere else - and so do you!

So, whatever reason caused the last outage they where able to bring it back online again and I am fine with that and I do think they deserve some gratitude from all of us.
As I can understand, when I'm reading the board, it was some losses (half an hour or so). That is absolutely not good, but tough shit – bad things happen everyday! It could have been 30 days of lost work. So cheer up and do some crunching as long as it works! ;-D

Happy crunching
ID: 82492 · Report as offensive
Profile The Psychotic One
Volunteer tester
Avatar

Send message
Joined: 22 May 00
Posts: 50
Credit: 4,099,029
RAC: 0
United States
Message 82500 - Posted: 26 Feb 2005, 7:52:10 UTC
Last modified: 26 Feb 2005, 7:55:40 UTC

This post is O/T

Timmy, OH MY GAWD! Hysterical avatar. I dropped my pop while LMAO, when I saw your avatar. You owe me a new Dew. j/k I had to try to lighten the mood in here... :)


PS. What am I doing wrong sith my sig? I'm trying to show both...
William D. Gagliardi
ID: 82500 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34271
Credit: 79,922,639
RAC: 80
Germany
Message 82514 - Posted: 26 Feb 2005, 8:43:26 UTC

Hi

@Peder
I totaly agree.

BTW: I say they´ve done a great job.

greetz from Germany
Mike



With each crime and every kindness we birth our future.
ID: 82514 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Power Outage mayhem: Feb 24/05


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.