Red Shift (Mar 01 2011)


log in

Advanced search

Message boards : Technical News : Red Shift (Mar 01 2011)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1082770 - Posted: 1 Mar 2011, 23:15:19 UTC

Happy March to one and all. Haven't have much to write about lately, but here's a round up.

We had our usual weekly maintenance outage today during which we took care of all kinds of stuff besides the usual mysql database compression/backup. Early this morning I noticed the replica mysql server had some broken tables, which led me to discover a drive had failed on that system last night - a 73GB fibre channel drive. Not a big deal, as we have tons of these kicking around from older servers at this point. This was easy enough to hot swap, though I got lost in some internal closet networking updates as this disk array is only accessible via telnet. And then the mysql daemon on the replica freaked out a little bit when the new drive was introduced, so I had to reboot the system, re-fix broken tables, etc. etc. etc. The replica is still catching up (will be for a while).

Today we also moved synergy off the probably-flakey UPS. Yeah, I know we should have done this earlier, but just haven't gotten around to it yet. If anything this gave us one more data point in the form of yet another automatic biweekly reboot at Sunday around 3pm (a couple days ago). Now the UPS is out of the equation, we have to wait 2 weeks to see if this was indeed the problem.

What else... we moved a lot more bits from ptolemy onto thumper. You may notice some general speedups on the website or elsewhere. We hope. And Jeff and I tackled a ton of timing tests for the science database on oscar. We're finding all the bottlenecks and finding ways around them. The good news is the database select throughput has gone from 100 spikes/second to 17,000 spikes/second. However these are under optimal conditions. In reality we'll have to deal with many of the aforementioned bottlenecks. Also: gowron is back to being the main workunit server (the full transition is far from complete, though).

That's been my day so far. How's your day?

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,148
RAC: 3
United States
Message 1082777 - Posted: 1 Mar 2011, 23:28:15 UTC - in response to Message 1082770.

Keep in mind UPS's do need new batteries from time to time. ;)
____________

Janice

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4139
Credit: 33,442,384
RAC: 20,320
United Kingdom
Message 1082781 - Posted: 1 Mar 2011, 23:34:24 UTC - in response to Message 1082770.

Thanks for the update Matt,

Claggy

Thomas Arnold
Volunteer tester
Send message
Joined: 14 May 99
Posts: 56
Credit: 61,046,144
RAC: 0
United States
Message 1082782 - Posted: 1 Mar 2011, 23:54:44 UTC - in response to Message 1082770.


Thank you as always for the update Matt. Man Alive you all have a lot on your plate. It is always fascinating to read the stuff you do to keep us happily crunching away.

I do want to point out something on the Server status page (like you don't have enough things on the to do list.)

On the the bottom of the page there are definitions/explanations for Tasks ready to send, Tasks in progress, etc. but under the Data Distribution State at the top they are referred to as Results ready to send, etc. I think the Tasks terminology is spot on but the Results reference muddies the waters.

Thanks again to you and everyone at the Lab.

Kind Regards,

Tom
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12702
Credit: 7,190,842
RAC: 15,519
United States
Message 1082803 - Posted: 2 Mar 2011, 1:29:17 UTC

Thanks for the update and please insist Eric get the Beta Status page fixed before V7 work hits the masses.

____________

Profile Joel
Send message
Joined: 31 Oct 08
Posts: 100
Credit: 4,577,300
RAC: 34
United States
Message 1082889 - Posted: 2 Mar 2011, 8:57:05 UTC

Thanks for the update, and good job keeping everything in order over there! Since the big issues a few weeks ago, things have been looking pretty good. The weekly outages have been short, which is much appreciated by this hobbyist...

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46492
Credit: 36,841,198
RAC: 5,025
United States
Message 1082893 - Posted: 2 Mar 2011, 9:23:46 UTC

Thanks for the update Matt, Me I just have to pack for a move, Which is being covered in My thread in My sig.
____________
My Facebook, War Commander, 2015

Profile Black Squirrel Prime
Send message
Joined: 29 Jul 07
Posts: 8
Credit: 11,391,062
RAC: 7,555
United States
Message 1082954 - Posted: 2 Mar 2011, 15:35:13 UTC - in response to Message 1082781.

Thanks for the update Matt,

Claggy


Just replaced 2 of mine over the weekend - the UPS software was sensing something initiating shutdowns. randomly.

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11932
Credit: 14,607,983
RAC: 12,471
United States
Message 1082960 - Posted: 2 Mar 2011, 16:13:28 UTC - in response to Message 1082954.


Just replaced 2 of mine over the weekend - the UPS software was sensing something initiating shutdowns. randomly.

I once had a problem where I was trying to communicate with some other device altogether via a serial cable and somehow the computer kept interpreting this as a shutdown command coming from the UPS. I think I disabled the UPS software until I was done with the other thing.

Thanks for the update and all your hard work, Matt. As for me, SSDD. I do notice, however, that my computer hasn't communicated with the project in about 30 hours now. This seems unusual, but I'm sitting on 20 WUs (and no Einstein WUs), so I won't worry about it for another day.

David
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7068
Credit: 27,372,006
RAC: 34,317
United Kingdom
Message 1083034 - Posted: 2 Mar 2011, 20:57:00 UTC - in response to Message 1082777.

Keep in mind UPS's do need new batteries from time to time. ;)

Around every 2 years.

Also keep in mind that at least one well known UPS manufacturer sets it's default self test to "14 days". I know from experience (my company has 150+ sites in the UK all with one or more UPS) that sometimes the self test can cause the UPS to fail, without any actual error in the log.
____________


Today is life, the only life we're sure of. Make the most of today.

GiftedPlacebo
Avatar
Send message
Joined: 17 May 99
Posts: 3
Credit: 3,309,145
RAC: 13
United States
Message 1083069 - Posted: 2 Mar 2011, 22:03:11 UTC - in response to Message 1083034.

Keep in mind UPS's do need new batteries from time to time. ;)

Around every 2 years.

Also keep in mind that at least one well known UPS manufacturer sets it's default self test to "14 days". I know from experience (my company has 150+ sites in the UK all with one or more UPS) that sometimes the self test can cause the UPS to fail, without any actual error in the log.


Indeed. Assuming you have machines with redundant power supplies, I like to split machines over multiple UPS's. You can still set up the UPS software to send shutdown notices for "real" power failure events, but when you have a self-test induced power off, it is instant and no shutdown messages are sent (in my experience). It's not fool proof, but it has saved me many times when I've had a UPS fail on our distributed file servers.

____________

Tom95134Project donor
Send message
Joined: 27 Nov 01
Posts: 213
Credit: 3,388,405
RAC: 1,469
United States
Message 1083075 - Posted: 2 Mar 2011, 22:50:33 UTC - in response to Message 1082777.

Keep in mind UPS's do need new batteries from time to time. ;)

And they really need to be exercised about once a month with a fairly deep discharge about twice a year.
____________

Tom95134Project donor
Send message
Joined: 27 Nov 01
Posts: 213
Credit: 3,388,405
RAC: 1,469
United States
Message 1083076 - Posted: 2 Mar 2011, 22:54:23 UTC - in response to Message 1083034.

Keep in mind UPS's do need new batteries from time to time. ;)

Around every 2 years.

Also keep in mind that at least one well known UPS manufacturer sets it's default self test to "14 days". I know from experience (my company has 150+ sites in the UK all with one or more UPS) that sometimes the self test can cause the UPS to fail, without any actual error in the log.

That's very interesting. I've never had one "burp" the attached equipment due to a test cycle. Even when it is a deep (80~90%) test cycle. All our UPS are APC.

____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 621
Credit: 142,867,879
RAC: 145,860
United Kingdom
Message 1083094 - Posted: 3 Mar 2011, 0:07:59 UTC - in response to Message 1083076.

Keep in mind UPS's do need new batteries from time to time. ;)

Around every 2 years.

Also keep in mind that at least one well known UPS manufacturer sets it's default self test to "14 days". I know from experience (my company has 150+ sites in the UK all with one or more UPS) that sometimes the self test can cause the UPS to fail, without any actual error in the log.

That's very interesting. I've never had one "burp" the attached equipment due to a test cycle. Even when it is a deep (80~90%) test cycle. All our UPS are APC.


I recently had to replace the battery in an APC Smart-UPS 720. It was failing its 14-day self-test...

Function: Automatic Self-test
Factory Default: Every 14 days (336 hours)
User Selectable Choices: Every 7 days(168 hours), On Startup Only, No Self test
Description: Set the interval at which the UPS will execute a self-test.

"During the self-test, the UPS briefly operates the connected equipment on battery."

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,504
RAC: 6
United States
Message 1083112 - Posted: 3 Mar 2011, 1:57:54 UTC

Wow! I am frequently amazed at the interesting stuff posted on these forums. Thanks for all the helpful info.

Whit

SockGap
Send message
Joined: 16 Apr 07
Posts: 13
Credit: 5,936,549
RAC: 2,866
Australia
Message 1083180 - Posted: 3 Mar 2011, 11:39:38 UTC - in response to Message 1083069.

Assuming you have machines with redundant power supplies, I like to split machines over multiple UPS's.


Where I work we were told to not put the redundant power supplies on different phases - something about having 415 volts of potential energy if something goes wrong. With one phase you have 240 volts that will give you a nasty kick. When you have two phases interacting you get 415 volts and that is a lot more likely to kill you. I have no idea if it's the same with multiple UPSs - but in theory they are changing the phase and therefore you could get more of a jolt out of two of them. You'd still have to be pretty unlucky to have something go wrong with two power supplies at once.

I've never had one "burp" the attached equipment due to a test cycle. Even when it is a deep (80~90%) test cycle. All our UPS are APC.


I deal with a few dozen APC UPSs at work and I've seen a faulty battery drop the load during a self test a few times... It seemed to have more to do with the batteries - the ones in some of our hotter cupboards had "dried out" (or at least expanded and cracked the plastic battery case) and were not working at all...
____________

GiftedPlacebo
Avatar
Send message
Joined: 17 May 99
Posts: 3
Credit: 3,309,145
RAC: 13
United States
Message 1083201 - Posted: 3 Mar 2011, 14:48:48 UTC - in response to Message 1083180.

Assuming you have machines with redundant power supplies, I like to split machines over multiple UPS's.


Where I work we were told to not put the redundant power supplies on different phases - something about having 415 volts of potential energy if something goes wrong. With one phase you have 240 volts that will give you a nasty kick. When you have two phases interacting you get 415 volts and that is a lot more likely to kill you. I have no idea if it's the same with multiple UPSs - but in theory they are changing the phase and therefore you could get more of a jolt out of two of them. You'd still have to be pretty unlucky to have something go wrong with two power supplies at once.

I've never had one "burp" the attached equipment due to a test cycle. Even when it is a deep (80~90%) test cycle. All our UPS are APC.


I deal with a few dozen APC UPSs at work and I've seen a faulty battery drop the load during a self test a few times... It seemed to have more to do with the batteries - the ones in some of our hotter cupboards had "dried out" (or at least expanded and cracked the plastic battery case) and were not working at all...


All the best practice information I've read suggests putting redundant power supplies on separate UPS and even separate power grids. I think if multiple power supplies failed in such a fashion that there was 415V flowing into the system, your bigger concern would be putting out the fire rather than server maintenance =) But now I'm intrigued, as I've never heard that warning before. Off to Google!

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8629
Credit: 51,416,667
RAC: 50,585
United Kingdom
Message 1083205 - Posted: 3 Mar 2011, 14:58:55 UTC

I've certainly seen equipment killed by a three-phase grounding fault generating 415v. Fortunately, the main victim was a sacrificial surge protector - the telephone PBX behind it was saved. And that was just equipment plugged into a standard UK 13A ring main - in a medium-sized office block, with, I guess, different phases on different floors. Somebody working on the installation connected, or more likely disconnected, the wrong wire.

When I had a couple of redundant PSU servers to look after, knowing that they only need one to run (and in an environment where if the power went out, nobody would need to access the servers anyway), I plugged one PSU into a UPS, and the other direct into the mains. Didn't seem to do any harm.

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32030
Credit: 13,713,301
RAC: 28,140
United Kingdom
Message 1083221 - Posted: 3 Mar 2011, 16:27:20 UTC

As per usual, thanks to Matt for keeping us up to date.

As far as UPS's go, I run 3 APC-1500 units for my 6 rigs, 2 to each. Mainly for brown outs as my local mains is "dirty", and we seem to have regular sub-station switchings going on. Lights dim and come back in a second, but it used to re-boot the pc's. Haven't had a total mains failure for a year now :-)

Each has the full 5 battery lights, and is set as default to self-test every 14 days. The units are 3 years old and still going strong. Only drawback is that UPS's have the "razor blade" syndrome. You can pick up a s/h UPS on Ebay for £20, but a new battery will cost you 3 times that.

As this is a little off topic perhaps it might be useful to start a UPS thread in Crunching?

____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, didn't take pot advice!


Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3250
Credit: 31,880,131
RAC: 3,905
Netherlands
Message 1083226 - Posted: 3 Mar 2011, 17:14:50 UTC - in response to Message 1083221.

Thanks Matt, for the update, last power outage, I witnessed,
was 26 years ago, when a 10KV/500V/380V-3 fase,transformer
exploded! Not a big one, though.
After this, alot has been changed. Only 400KV 3 fase is above
ground, every 10KV line, has been put underground.
I remember using an 'antenna' to feed a few Fluorecent lights,
close to the 1000KWatt TV transmitter, which now is out off use,
since atleast 15 years.

Power-outages are also very rare and noone I know, uses an UPS.
But the Netherlands are becomming one big city, atleast the west
part of it, close to the sea.
They already call it the 'Randstad', from Rotterdam to Amsterdam,
is already a city with big green (houses) in between.

And it's a beautifull day, lots of sunshine and about 7C.
(But it still freezes, at night)



____________

1 · 2 · Next

Message boards : Technical News : Red Shift (Mar 01 2011)

Copyright © 2014 University of California