Magic Carpet RIde (Jul 19 2007)

Message boards : Technical News : Magic Carpet RIde (Jul 19 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 606152 - Posted: 19 Jul 2007, 22:02:00 UTC

Another day of minor tasks. Spent a chunk of the morning learning "parted" which I guess replaced "fdisk" for partitioning disks in the world of linux. Worked with Bob to figure out why recent science database dumps are failing and how to install the latest version of informix (for replica testing). Jeff and I started mapping our updated power requirements for the closet - we have a couple UPS's with red lights meaning we have some batteries to replace soon. Sometimes I feel about UPS's like I feel about all forms of insurance (car, house, health, etc.). Extra expense and effort up front to set up, regular expense and effort to maintain, and then when push comes to shove they don't save your butt nearly as well as you thought it would. In fact, a lot of the time it makes things worse. I had UPS's just up and die and take systems along with them. Likewise, I had two different insurance agencies on two separate occasions screw up their own paperwork thus nullifying my policies without my notification, wreaking havoc on my life in various unpredictable, unamusing ways. Okay I'm ranting here..

As for reasons stated earlier involving why our results to send queue went to zero a couple days ago, others have since suggested that, due to news of the impending power outage this weekend, many users have been flushing their caches to ensure they have enough work to withstand the predicted downtime. If this is indeed true, this could be seen as a distributed denial-of-service attack. But don't worry - I won't be calling the police.

Played a gig last night for a giant Applied Materials party in San Francisco. I like the fact I get paid about four times the hourly rate performing songs like "Magic Carpet Ride" at these hyper-techie functions than I do actually managing the back-end network of the world's largest supercomputing project.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 606152 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 606210 - Posted: 19 Jul 2007, 23:34:33 UTC
Last modified: 19 Jul 2007, 23:35:27 UTC

Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain.

Thanks for the update.
ID: 606210 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 606219 - Posted: 19 Jul 2007, 23:43:08 UTC - in response to Message 606210.  

Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain.

Thanks for the update.

If you have a UPS, you should be able (and ready) to pull the plug at any time, and do so without fear.

If you can't, you need to buy new UPSes.

My servers draw about 500 watts. I have two 2200 VA. UPSes on an automatic transfer switch.

As long as one has power, everything runs fine. I also test run-time (I can let one "run flat" and the transfer switch handles it).

The problem is when they aren't tested routinely, you get surprises.

-- Ned
ID: 606219 · Report as offensive
Dena Wiltsie
Volunteer tester

Send message
Joined: 19 Apr 01
Posts: 1628
Credit: 24,230,968
RAC: 26
United States
Message 606352 - Posted: 20 Jul 2007, 5:10:05 UTC

Big problem with UPS's is that they use lead acid batterys. While they can last as long as 6 years, 4 years is pushing it. If up time is important, put the date the new batterys were installed on the outside of the unit and replace the batterys before they have time to fail. I work with APC and the only failure I have seen were due to batterys and an incorrectly wired outlet.
ID: 606352 · Report as offensive
Trueinnerpeace
Avatar

Send message
Joined: 21 May 99
Posts: 8
Credit: 184,805
RAC: 0
United States
Message 606377 - Posted: 20 Jul 2007, 8:04:38 UTC

At the risk of overstating the obvious, preventative maintenance, like everything in life is key. Having been a former DEC field service engineer from late '70's I can tell you the PM's were a regular routine and in the intervening years I see nothing has changed other than having gotten smaller is all. Oh hum...
ID: 606377 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 606592 - Posted: 20 Jul 2007, 18:32:50 UTC - in response to Message 606219.  

Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain.

Thanks for the update.

If you have a UPS, you should be able (and ready) to pull the plug at any time, and do so without fear.

If you can't, you need to buy new UPSes.

My servers draw about 500 watts. I have two 2200 VA. UPSes on an automatic transfer switch.

As long as one has power, everything runs fine. I also test run-time (I can let one "run flat" and the transfer switch handles it).

The problem is when they aren't tested routinely, you get surprises.

-- Ned

Whilst I agree, unfortunately my company expanded very rapidly about 4 years ago and cost was an important consideration, so UPS's were just installed and "forgotten". The monitoring software wasn't even installed in most cases. Now of course we are suffering. Each site was installed with just one UPS and most run around 50-60% load, and until I instigated a program of installing the software and getting the UPS's to report failures, the first we knew of problems was when there was a power outage and the UPS immediately failed.

Still were getting to grips with them now.

Bernie
ID: 606592 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 606594 - Posted: 20 Jul 2007, 18:39:05 UTC - in response to Message 606592.  

Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain.

Thanks for the update.

If you have a UPS, you should be able (and ready) to pull the plug at any time, and do so without fear.

If you can't, you need to buy new UPSes.

My servers draw about 500 watts. I have two 2200 VA. UPSes on an automatic transfer switch.

As long as one has power, everything runs fine. I also test run-time (I can let one "run flat" and the transfer switch handles it).

The problem is when they aren't tested routinely, you get surprises.

-- Ned

Whilst I agree, unfortunately my company expanded very rapidly about 4 years ago and cost was an important consideration, so UPS's were just installed and "forgotten". The monitoring software wasn't even installed in most cases. Now of course we are suffering. Each site was installed with just one UPS and most run around 50-60% load, and until I instigated a program of installing the software and getting the UPS's to report failures, the first we knew of problems was when there was a power outage and the UPS immediately failed.

Still were getting to grips with them now.

Bernie

I've not found the monitoring software to be that useful, frankly, which is why I just jerk the power plug and observe.

What I do is switch the transfer switch to make one UPS "primary" and plug in a normal, electric clock. I set the clock for noon, and pull the plug. When the UPS batteries "go dry" the clock stops, and the transfer switch puts the load on the other UPS.

One of my UPSes will do six hours under load.

... but that's not the normal setup. The normal setup is for the UPS to signal "mains down" and the server(s) then do a graceful shutdown and power off the UPS. That's what the software is for.

For best battery life, you want to get off the UPS before the batteries get hot, and in most "factory configuration" UPSes, they will get hot pretty quick.

-- Ned
ID: 606594 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 606624 - Posted: 20 Jul 2007, 19:59:27 UTC
Last modified: 20 Jul 2007, 20:18:01 UTC

Why do you think this things are named 'ups...!'...? (with a face of pannic from elseone administrator when that succeed...) Ha, ha, ha... Sorry for the joke...
ID: 606624 · Report as offensive
Arthur Clarke

Send message
Joined: 3 Apr 00
Posts: 1
Credit: 63,209
RAC: 0
United States
Message 606845 - Posted: 21 Jul 2007, 5:02:19 UTC - in response to Message 606152.  


As for reasons stated earlier involving why our results to send queue went to zero a couple days ago, others have since suggested that, due to news of the impending power outage this weekend, many users have been flushing their caches to ensure they have enough work to withstand the predicted downtime. If this is indeed true, this could be seen as a distributed denial-of-service attack. But don't worry - I won't be calling the police.


My preferences are set to maintain a stockpile of about 10 days of work to do. I'd been running version 5.8.16, and for the past week or so it had not been receiving new work units. Restarting it or reinstalling it didn't change the behavior. The results to send queue wasn't zero for all of that time.

It was down to the last two workunits when I noticed that version 5.10.13 now was recommended. I downloaded and installed it. The next time it went to the well, it received a refill of about 340 hours of work (18 workunits).

Did the mix of client versions requesting new work change significantly during the past week?
ID: 606845 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 606891 - Posted: 21 Jul 2007, 9:28:44 UTC - in response to Message 606845.  
Last modified: 21 Jul 2007, 9:33:28 UTC


As for reasons stated earlier involving why our results to send queue went to zero a couple days ago, others have since suggested that, due to news of the impending power outage this weekend, many users have been flushing their caches to ensure they have enough work to withstand the predicted downtime. If this is indeed true, this could be seen as a distributed denial-of-service attack. But don't worry - I won't be calling the police.


My preferences are set to maintain a stockpile of about 10 days of work to do. I'd been running version 5.8.16, and for the past week or so it had not been receiving new work units. Restarting it or reinstalling it didn't change the behavior. The results to send queue wasn't zero for all of that time.

It was down to the last two workunits when I noticed that version 5.10.13 now was recommended. I downloaded and installed it. The next time it went to the well, it received a refill of about 340 hours of work (18 workunits).

Did the mix of client versions requesting new work change significantly during the past week?



Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.

ID: 606891 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 606903 - Posted: 21 Jul 2007, 9:51:42 UTC - in response to Message 606891.  


As for reasons stated earlier involving why our results to send queue went to zero a couple days ago, others have since suggested that, due to news of the impending power outage this weekend, many users have been flushing their caches to ensure they have enough work to withstand the predicted downtime. If this is indeed true, this could be seen as a distributed denial-of-service attack. But don't worry - I won't be calling the police.


My preferences are set to maintain a stockpile of about 10 days of work to do. I'd been running version 5.8.16, and for the past week or so it had not been receiving new work units. Restarting it or reinstalling it didn't change the behavior. The results to send queue wasn't zero for all of that time.

It was down to the last two workunits when I noticed that version 5.10.13 now was recommended. I downloaded and installed it. The next time it went to the well, it received a refill of about 340 hours of work (18 workunits).

Did the mix of client versions requesting new work change significantly during the past week?



Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.


Not true, Boinc 5.8.16 Will Cache up to 10 days work, It's just on you General Preferences page you have to put the figure in the first box, the one that says:

Computer is connected to the Internet about every
(Leave blank or 0 if always connected.
BOINC will try to maintain at least this much work.)

Claggy
ID: 606903 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 606912 - Posted: 21 Jul 2007, 10:09:29 UTC - in response to Message 606903.  


As for reasons stated earlier involving why our results to send queue went to zero a couple days ago, others have since suggested that, due to news of the impending power outage this weekend, many users have been flushing their caches to ensure they have enough work to withstand the predicted downtime. If this is indeed true, this could be seen as a distributed denial-of-service attack. But don't worry - I won't be calling the police.


My preferences are set to maintain a stockpile of about 10 days of work to do. I'd been running version 5.8.16, and for the past week or so it had not been receiving new work units. Restarting it or reinstalling it didn't change the behavior. The results to send queue wasn't zero for all of that time.

It was down to the last two workunits when I noticed that version 5.10.13 now was recommended. I downloaded and installed it. The next time it went to the well, it received a refill of about 340 hours of work (18 workunits).

Did the mix of client versions requesting new work change significantly during the past week?



Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.


Not true, Boinc 5.8.16 Will Cache up to 10 days work, It's just on you General Preferences page you have to put the figure in the first box, the one that says:

Computer is connected to the Internet about every
(Leave blank or 0 if always connected.
BOINC will try to maintain at least this much work.)

Claggy


And the preferences says
'Maintain enough work for an additional'... n days
'(Requires 5.10+ client.) '

But if you dont have 5.10.7 or 5.10.13 (by ex.), set this parameter to 10 days is futile...

Regards Claggy.


Logan.
ID: 606912 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 607120 - Posted: 21 Jul 2007, 16:26:04 UTC - in response to Message 606891.  


Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.


This is incorrect.

There are more cache settings in 5.10+ than 5.8.16, but BOINC has been caching work well back into the 4.x versions.
ID: 607120 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 607204 - Posted: 21 Jul 2007, 19:15:54 UTC

Please keep in mind that with short deadlines and a large cache, BOINC will go into EDF ("panic") mode, thinking it will not be able to finish all the work it just downloaded. When this happens, BOINC will automatically cut off any more downloads until it gets the cache down to a reasonable level.
ID: 607204 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 607221 - Posted: 21 Jul 2007, 19:34:26 UTC

i got 271 units automatically (weird) awhile ago . . . didn't ask - just fot 'em . . . guess that'll do for the outage (HP Laptop dv9060us Intel Dual Core 2). . .
ID: 607221 · Report as offensive
Profile Logan
Volunteer tester
Avatar

Send message
Joined: 26 Jan 07
Posts: 743
Credit: 918,353
RAC: 0
Spain
Message 607436 - Posted: 22 Jul 2007, 20:32:47 UTC - in response to Message 607120.  


Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.


This is incorrect.

There are more cache settings in 5.10+ than 5.8.16, but BOINC has been caching work well back into the 4.x versions.



Try to use the 5.8.16 version for windows and after that, tell me how fine your cache works... ha, ha, ha...
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 607436 · Report as offensive
Greg Niehues

Send message
Joined: 29 Oct 06
Posts: 3
Credit: 576,026
RAC: 0
Message 607469 - Posted: 22 Jul 2007, 21:26:32 UTC - in response to Message 607436.  


Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.


This is incorrect.

There are more cache settings in 5.10+ than 5.8.16, but BOINC has been caching work well back into the 4.x versions.



Try to use the 5.8.16 version for windows and after that, tell me how fine your cache works... ha, ha, ha...


I'm using it - and caching with it. Works fine for me. ha, ha, ha.....
ID: 607469 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 607502 - Posted: 22 Jul 2007, 23:05:59 UTC - in response to Message 607436.  


Hi Clarke.

The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that.

Logan.


This is incorrect.

There are more cache settings in 5.10+ than 5.8.16, but BOINC has been caching work well back into the 4.x versions.



Try to use the 5.8.16 version for windows and after that, tell me how fine your cache works... ha, ha, ha...

Actually, as a tester I've run most versions, so I know -- and I've spent a lot of time experimenting with how BOINC handles the various parameters.

If you set "cache additional days" to 0, 5.10+ and 5.8.16 work exactly the same way, and the "connect every 'x' days" is the cache setting.

If you tell BOINC "connect every 3 days" it will try to carry enough work so that it does not run out in less than 3 days, and doesn't miss deadlines at 6.

Because the setting is indirect (you aren't saying "cache 3 days" you're setting the interval) it won't do exactly three days.

If you set "connect every 10 days" BOINC will have trouble if you have work units with short deadlines -- and that happens alot. A 10 day interval will not cache 10 days of work because BOINC knows that if it waits 10 days that work will be late.

But, you're relatively new, and this has been discussed ad-nausiam.

Every version of BOINC, going back to the first public release, has had caching. Most versions have worked as designed -- and the arguments have always been over the design, not the implementation.
ID: 607502 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 607907 - Posted: 23 Jul 2007, 23:06:11 UTC - in response to Message 606352.  

Big problem with UPS's is that they use lead acid batterys. While they can last as long as 6 years, 4 years is pushing it. If up time is important, put the date the new batterys were installed on the outside of the unit and replace the batterys before they have time to fail. I work with APC and the only failure I have seen were due to batterys and an incorrectly wired outlet.


There are intriguing possibilities in flywheel-based UPSes

I think only a couple of companies have brought products to the market for datacenters, but it's a very nice alternative to batteries.
ID: 607907 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 607991 - Posted: 24 Jul 2007, 5:08:17 UTC - in response to Message 607907.  
Last modified: 24 Jul 2007, 5:08:44 UTC

There are intriguing possibilities in flywheel-based UPSes

Converting electrical energy to mechanical energy & back to electrical energy generally isn't as efficient as electircal-chemical-electrical.

Grant
Darwin NT
ID: 607991 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Magic Carpet RIde (Jul 19 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.