Out of the Frying Pan (Feb 17 2010)


log in

Advanced search

Message boards : Technical News : Out of the Frying Pan (Feb 17 2010)

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author Message
1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971096 - Posted: 18 Feb 2010, 5:06:31 UTC - in response to Message 971093.

Nice to hear everything is almost back to normal. Unfortunate that alot of work units were aborted while trying to upload them as their deadline had passed during the downtime. A have a feeling more will be aborted as they are still unable to be uploaded..

Kinda dissapointed but what can ya do aye? You win some, you lose some - gotta keep on truckin' ! :)

You should always let those ride -- you likely would still get credit.
____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12476
Credit: 2,696,532
RAC: 1,451
Netherlands
Message 971098 - Posted: 18 Feb 2010, 5:09:25 UTC - in response to Message 971095.

The A/C died and it's too hot? It's winter, it's 25 degrees and snowing...open the windows. That'll cool you off.

Assuming the server room is near an outside wall and has openable windows, of course.

And assuming you want moist sea-air to enter your server room, wreaking havoc with all the electrics in there. ;-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Dena Wiltsie
Volunteer tester
Send message
Joined: 19 Apr 01
Posts: 1300
Credit: 650,361
RAC: 3,045
United States
Message 971099 - Posted: 18 Feb 2010, 5:09:27 UTC - in response to Message 971079.

For a while my job depended on a system cooled by an air conditioner that I could not depend on. My solution was to get one of these and wire it into an extension cord so I could connect all the non-replaceable equipment to it. I then set it to about 80 F and had no worries about failed hardware. The catch is you must make sure your backups are up to date as the power down will be very hard and in my case the raid lost a drive often when it was powered down (very old drives).

Most anything semi-modern supports some sort of "dumb" signaling from a UPS.

It uses a normal serial port, and only the handshake lines. A line goes "low" to signal "low battery" and the UPS waits for the system to drop a handshake line back when it is safe for the UPS to turn off.

One could build a "UPS" whose only job was to signal low battery when the temperature got above a certain temperature, and kill power when the system said "okay."

Power would be restored when it got cold enough. Or not.

The P390 came with a latching power switch. The software was unable to cut the power and the only way the power could be turned off was to push the button or pull the power cord. I don't think Warp has power support in it and even if it did, VM/ESA didn't have that type of support in a P390. Running on real hardware VM/ESA might but the P390 was a strange animal for IBM. My job wasn't to spend a few month getting what you suggest to work, I needed something quick and dirty to protect the hardware because we couldn't afford to replace it.
I have what you suggest all set up and functional on my MAC but the P390 is about 15 year old hardware pressed into service long after IBM considered it obsolete.
____________

Nate Itkin
Send message
Joined: 29 Jun 99
Posts: 3
Credit: 702,956
RAC: 213
United States
Message 971100 - Posted: 18 Feb 2010, 5:22:38 UTC

I concur with Mr. Haselgrove. Something was wrong with the scheduler before the Tuesday shutdown. My crunchers (located in Texas, California, and Hawaii) all had entries like this in their logs:
15-Feb-2010 22:07:14 [SETI@home] Scheduler request failed: Timeout was reached
This particular entry was GMT -10.


____________

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 2007
Credit: 11,249,742
RAC: 15,056
United States
Message 971105 - Posted: 18 Feb 2010, 6:08:47 UTC - in response to Message 971080.
Last modified: 18 Feb 2010, 6:13:42 UTC

The smart-ass in me made me write this.....

The A/C died and it's too hot? It's winter, it's 25 degrees and snowing...open the windows. That'll cool you off.


...You forget that the project is in Berkeley... where (during the day at this time of year...) it is about 60-65ºF and only goes down to 50-55ºF at night... NTM that yesterday morning, and this morning, there was a heavy fog (at least in my location, 2½ miles away...)

Besides, Matt always refers to it as the "server closet", which implies that it doesn't have a window... (I think I've read that it is a re-purposed janitor closet...)
____________
.

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13198
Credit: 7,969,240
RAC: 15,854
United States
Message 971112 - Posted: 18 Feb 2010, 6:55:10 UTC - in response to Message 970983.

So how much is a second A/C unit installed?

Perhaps time to add up the thermal load and retire some hot equipment for some cooler equipment.

Yes, you need to get thermal cut out switches. As you have UPC's, that makes it much easier for a controlled shutdown.

Now if you could automate the door opening and a couple of big fans coming on ...


____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2357
Credit: 8,953,535
RAC: 3,946
United States
Message 971135 - Posted: 18 Feb 2010, 8:55:21 UTC - in response to Message 971112.

So how much is a second A/C unit installed?

Perhaps time to add up the thermal load and retire some hot equipment for some cooler equipment.

Yes, you need to get thermal cut out switches. As you have UPC's, that makes it much easier for a controlled shutdown.

Now if you could automate the door opening and a couple of big fans coming on ...


I don't think a second A/C system would be ideal. From what I remember hearing, power distribution/availability is already pretty much at maximum capacity as it is. Every time a new server is installed in the closet, it means one or two old ones being re-purposed elsewhere. Last I knew, there was still plenty of rack space, but it's a problem of power availability.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Ronal E. Zepeda Trujillo
Avatar
Send message
Joined: 14 Jul 05
Posts: 9
Credit: 2,997,829
RAC: 118
Chile
Message 971150 - Posted: 18 Feb 2010, 10:49:43 UTC

At least, was a little disaster... could it be worst...
____________
Only a boy with responsabilities of an old man

Bounce
Send message
Joined: 3 Apr 99
Posts: 66
Credit: 5,604,569
RAC: 0
United States
Message 971191 - Posted: 18 Feb 2010, 14:24:16 UTC - in response to Message 971082.
Last modified: 18 Feb 2010, 14:29:43 UTC

>I started a chkdsk over 10 hours ago and it's less than halfway through!

Try SpinRite (http://www.grc.com - just a satisfied customer). Much better at recovering data and grooming a hdd than what M$ includes for free.

>So how much is a second A/C unit installed?

At my last agency, a sub-unit (which was required due to how the building did its HVAC) was $10,000.00. These folks are begging for second hand servers to do their projects. I suspect that a real budget item like that is considered a little spendy. Even if UCB is taking a cut for basic facility management, extras like this are often done on the customer's dime.
____________

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 999
Credit: 209,513,627
RAC: 61,070
United States
Message 971197 - Posted: 18 Feb 2010, 14:53:43 UTC

SETI STAF, WE HAVE BEEN DOWN SINCE SUNDAY!

there has been no acknowledgment in the postings, other then BBQ servers. please fix the problem

thank you


____________

Profile FrostKing9
Avatar
Send message
Joined: 20 Oct 01
Posts: 39
Credit: 23,815,960
RAC: 0
United States
Message 971206 - Posted: 18 Feb 2010, 15:11:25 UTC

Yep... the upload and report process is still malfunctioning. Can barely upload completed WU's... only by repeatedly hitting the RETRY NOW on the TRANSFERS window. Then it only UPLOADS from 1 to 3 WU's at a time. And reporting all of those WU's isn't working at all. Not even after over 100-clicks over 8-hours on the UPDATE button on the PROJECTS window. <sigh>


____________


I DONATE money to SETI@home.... DO YOU?

I'm just slowly BOINC'ing along.

Hey... ET... you have a sister who likes earthlings?

Dave
Avatar
Send message
Joined: 29 Mar 02
Posts: 774
Credit: 23,193,139
RAC: 0
United Kingdom
Message 971208 - Posted: 18 Feb 2010, 15:17:27 UTC

Patience people...

DJStarfox
Send message
Joined: 23 May 01
Posts: 1045
Credit: 569,430
RAC: 102
United States
Message 971222 - Posted: 18 Feb 2010, 15:56:58 UTC - in response to Message 970983.

Matt,

That is insane. You urgently need some kind of automated thermal shutdown or emergency ventilation for that closet. The Linux kernel will shutdown the system when the CPU overheats but not hard drives or other components. If there were to be some kind of fire or failure of most drives, the next failure could mean the end of SETI@Home.

My brother configured a monitoring program called Nagios to sense his data center's temperature and email his cell phone above a certain temp. If you're interested, I could get more implementation details.

Profile Marc Frigon
Volunteer tester
Avatar
Send message
Joined: 7 Apr 05
Posts: 4
Credit: 1,225,932
RAC: 520
United States
Message 971230 - Posted: 18 Feb 2010, 16:07:57 UTC - in response to Message 971208.
Last modified: 18 Feb 2010, 16:15:54 UTC

Patience people...


I agree -- when looking back at Matt's original update ("Off the Beach") after returning from vacation, I was reminded that he does acknowledge that there were some problems even before the A/C failure (e.g. the uploading issues we've all been facing). So there's no need to get riled up about that right now. The way I see it, I'm going to give SETI@home a full week to get back to normal before any of us is really justified in panicking.

Actually, come to think of it, we might all do well to heed the wisdom of a certain "Guide" that proclaims in large, friendly letters: DON'T PANIC!

By the way, SETI@home staff: I really like the plan to have SETI@home and Astropulse on separate servers.
____________


"That's no moon. It's a space station." -Obi-Wan Kenobi
...If there's a Galactic Empire out there with a Death Star that's about to destroy us all, SETI will find it.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971263 - Posted: 18 Feb 2010, 17:47:37 UTC - in response to Message 971135.

So how much is a second A/C unit installed?

Perhaps time to add up the thermal load and retire some hot equipment for some cooler equipment.

Yes, you need to get thermal cut out switches. As you have UPC's, that makes it much easier for a controlled shutdown.

Now if you could automate the door opening and a couple of big fans coming on ...


I don't think a second A/C system would be ideal. From what I remember hearing, power distribution/availability is already pretty much at maximum capacity as it is. Every time a new server is installed in the closet, it means one or two old ones being re-purposed elsewhere. Last I knew, there was still plenty of rack space, but it's a problem of power availability.

.... and there is the issue of where do you dump the heat?

A/C doesn't make cold, it absorbs heat on the cold side and dumps it into a heatsink someplace else.

The easiest type of installation would be a "ductless split" but you still have to route some refrigerant tubing between the two units, and there is a distance limit.

Campus provides the A/C, so they probably either take what Campus provides, or pay for the installation, and like the gigibit fiber up the hill, SETI@Home is perenially short on cash.

Load shedding (automatically powering down the servers) based on temperature is probably more practical.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971264 - Posted: 18 Feb 2010, 17:50:58 UTC - in response to Message 971136.

I don't think power availability ever came into the equation.

Matt has said that there is a finite amount of power delivered to the closet. I don't know if the issue is the cost of a new branch circuit, or if there is some rule saying these closets come with a certain sized branch circuit.....

... but obviously, if they could pump more energy into the closet, at some point it'd be a fire hazard.

____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971280 - Posted: 18 Feb 2010, 18:36:22 UTC - in response to Message 971098.

The A/C died and it's too hot? It's winter, it's 25 degrees and snowing...open the windows. That'll cool you off.

Assuming the server room is near an outside wall and has openable windows, of course.

And assuming you want moist sea-air to enter your server room, wreaking havoc with all the electrics in there. ;-)

Campus is much farther from the ocean than my server room, which is kept cool by keeping the windows open.

This is much less expensive (and much greener) than A/C.
____________

Profile Peter Moss
Avatar
Send message
Joined: 15 Nov 99
Posts: 14
Credit: 1,535,937
RAC: 1,054
United Kingdom
Message 971303 - Posted: 18 Feb 2010, 19:35:24 UTC - in response to Message 971280.

I have almost 50 stuck items with - Upload Pending.

18/02/2010 18:19:20 SETI@home Reporting 1 completed tasks, not requesting new tasks
18/02/2010 18:19:42 Project communication failed: attempting access to reference site
18/02/2010 18:19:43 Internet access OK - project servers may be temporarily down.


These are UK times... Will they clear soon??
____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3770
Credit: 21,502,043
RAC: 15,547
Sweden
Message 971307 - Posted: 18 Feb 2010, 19:39:10 UTC - in response to Message 971303.

I have almost 50 stuck items with - Upload Pending.

18/02/2010 18:19:20 SETI@home Reporting 1 completed tasks, not requesting new tasks
18/02/2010 18:19:42 Project communication failed: attempting access to reference site
18/02/2010 18:19:43 Internet access OK - project servers may be temporarily down.


These are UK times... Will they clear soon??


It will clear in due time. Just let Boinc handle it. What's the hurry? I'm sure ET will wait for us to find him/her.

Everyone has stuck WU's, even me, but I just let Boinc handle it. It can take some days, but in the end it will all clear.

Sten-Arne

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Technical News : Out of the Frying Pan (Feb 17 2010)

Copyright © 2014 University of California