Message boards :
Technical News :
Magic Carpet RIde (Jul 19 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Another day of minor tasks. Spent a chunk of the morning learning "parted" which I guess replaced "fdisk" for partitioning disks in the world of linux. Worked with Bob to figure out why recent science database dumps are failing and how to install the latest version of informix (for replica testing). Jeff and I started mapping our updated power requirements for the closet - we have a couple UPS's with red lights meaning we have some batteries to replace soon. Sometimes I feel about UPS's like I feel about all forms of insurance (car, house, health, etc.). Extra expense and effort up front to set up, regular expense and effort to maintain, and then when push comes to shove they don't save your butt nearly as well as you thought it would. In fact, a lot of the time it makes things worse. I had UPS's just up and die and take systems along with them. Likewise, I had two different insurance agencies on two separate occasions screw up their own paperwork thus nullifying my policies without my notification, wreaking havoc on my life in various unpredictable, unamusing ways. Okay I'm ranting here.. As for reasons stated earlier involving why our results to send queue went to zero a couple days ago, others have since suggested that, due to news of the impending power outage this weekend, many users have been flushing their caches to ensure they have enough work to withstand the predicted downtime. If this is indeed true, this could be seen as a distributed denial-of-service attack. But don't worry - I won't be calling the police. Played a gig last night for a giant Applied Materials party in San Francisco. I like the fact I get paid about four times the hourly rate performing songs like "Magic Carpet Ride" at these hyper-techie functions than I do actually managing the back-end network of the world's largest supercomputing project. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain. Thanks for the update. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain. If you have a UPS, you should be able (and ready) to pull the plug at any time, and do so without fear. If you can't, you need to buy new UPSes. My servers draw about 500 watts. I have two 2200 VA. UPSes on an automatic transfer switch. As long as one has power, everything runs fine. I also test run-time (I can let one "run flat" and the transfer switch handles it). The problem is when they aren't tested routinely, you get surprises. -- Ned |
Dena Wiltsie Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26 |
Big problem with UPS's is that they use lead acid batterys. While they can last as long as 6 years, 4 years is pushing it. If up time is important, put the date the new batterys were installed on the outside of the unit and replace the batterys before they have time to fail. I work with APC and the only failure I have seen were due to batterys and an incorrectly wired outlet. |
Trueinnerpeace Send message Joined: 21 May 99 Posts: 8 Credit: 184,805 RAC: 0 |
At the risk of overstating the obvious, preventative maintenance, like everything in life is key. Having been a former DEC field service engineer from late '70's I can tell you the PM's were a regular routine and in the intervening years I see nothing has changed other than having gotten smaller is all. Oh hum... |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain. Whilst I agree, unfortunately my company expanded very rapidly about 4 years ago and cost was an important consideration, so UPS's were just installed and "forgotten". The monitoring software wasn't even installed in most cases. Now of course we are suffering. Each site was installed with just one UPS and most run around 50-60% load, and until I instigated a program of installing the software and getting the UPS's to report failures, the first we knew of problems was when there was a power outage and the UPS immediately failed. Still were getting to grips with them now. Bernie |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Ahh UPS's my favorite topic recently - I feel the same as you. My company has sites all over the UK, I oversee 20 and recently had 2 UPS's failures that actually took the servers down, diagnostic suggested "internal UPS fault please contact..." Fine when they work, but otherwise a general pain. I've not found the monitoring software to be that useful, frankly, which is why I just jerk the power plug and observe. What I do is switch the transfer switch to make one UPS "primary" and plug in a normal, electric clock. I set the clock for noon, and pull the plug. When the UPS batteries "go dry" the clock stops, and the transfer switch puts the load on the other UPS. One of my UPSes will do six hours under load. ... but that's not the normal setup. The normal setup is for the UPS to signal "mains down" and the server(s) then do a graceful shutdown and power off the UPS. That's what the software is for. For best battery life, you want to get off the UPS before the batteries get hot, and in most "factory configuration" UPSes, they will get hot pretty quick. -- Ned |
Logan Send message Joined: 26 Jan 07 Posts: 743 Credit: 918,353 RAC: 0 |
Why do you think this things are named 'ups...!'...? (with a face of pannic from elseone administrator when that succeed...) Ha, ha, ha... Sorry for the joke... |
Arthur Clarke Send message Joined: 3 Apr 00 Posts: 1 Credit: 63,209 RAC: 0 |
My preferences are set to maintain a stockpile of about 10 days of work to do. I'd been running version 5.8.16, and for the past week or so it had not been receiving new work units. Restarting it or reinstalling it didn't change the behavior. The results to send queue wasn't zero for all of that time. It was down to the last two workunits when I noticed that version 5.10.13 now was recommended. I downloaded and installed it. The next time it went to the well, it received a refill of about 340 hours of work (18 workunits). Did the mix of client versions requesting new work change significantly during the past week? |
Logan Send message Joined: 26 Jan 07 Posts: 743 Credit: 918,353 RAC: 0 |
Hi Clarke. The 5.8.16 BOINC manager version has not cache capability. You can set it to 10 days more, but it's futile... Only 5.10+ can use that. Logan. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Not true, Boinc 5.8.16 Will Cache up to 10 days work, It's just on you General Preferences page you have to put the figure in the first box, the one that says: Computer is connected to the Internet about every (Leave blank or 0 if always connected. BOINC will try to maintain at least this much work.) Claggy |
Logan Send message Joined: 26 Jan 07 Posts: 743 Credit: 918,353 RAC: 0 |
And the preferences says 'Maintain enough work for an additional'... n days '(Requires 5.10+ client.) ' But if you dont have 5.10.7 or 5.10.13 (by ex.), set this parameter to 10 days is futile... Regards Claggy. Logan. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
This is incorrect. There are more cache settings in 5.10+ than 5.8.16, but BOINC has been caching work well back into the 4.x versions. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Please keep in mind that with short deadlines and a large cache, BOINC will go into EDF ("panic") mode, thinking it will not be able to finish all the work it just downloaded. When this happens, BOINC will automatically cut off any more downloads until it gets the cache down to a reasonable level. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
i got 271 units automatically (weird) awhile ago . . . didn't ask - just fot 'em . . . guess that'll do for the outage (HP Laptop dv9060us Intel Dual Core 2). . . |
Logan Send message Joined: 26 Jan 07 Posts: 743 Credit: 918,353 RAC: 0 |
Try to use the 5.8.16 version for windows and after that, tell me how fine your cache works... ha, ha, ha... Logan. BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish) |
Greg Niehues Send message Joined: 29 Oct 06 Posts: 3 Credit: 576,026 RAC: 0 |
I'm using it - and caching with it. Works fine for me. ha, ha, ha..... |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Actually, as a tester I've run most versions, so I know -- and I've spent a lot of time experimenting with how BOINC handles the various parameters. If you set "cache additional days" to 0, 5.10+ and 5.8.16 work exactly the same way, and the "connect every 'x' days" is the cache setting. If you tell BOINC "connect every 3 days" it will try to carry enough work so that it does not run out in less than 3 days, and doesn't miss deadlines at 6. Because the setting is indirect (you aren't saying "cache 3 days" you're setting the interval) it won't do exactly three days. If you set "connect every 10 days" BOINC will have trouble if you have work units with short deadlines -- and that happens alot. A 10 day interval will not cache 10 days of work because BOINC knows that if it waits 10 days that work will be late. But, you're relatively new, and this has been discussed ad-nausiam. Every version of BOINC, going back to the first public release, has had caching. Most versions have worked as designed -- and the arguments have always been over the design, not the implementation. |
Uioped1 Send message Joined: 17 Sep 03 Posts: 50 Credit: 1,179,926 RAC: 0 |
Big problem with UPS's is that they use lead acid batterys. While they can last as long as 6 years, 4 years is pushing it. If up time is important, put the date the new batterys were installed on the outside of the unit and replace the batterys before they have time to fail. I work with APC and the only failure I have seen were due to batterys and an incorrectly wired outlet. There are intriguing possibilities in flywheel-based UPSes I think only a couple of companies have brought products to the market for datacenters, but it's a very nice alternative to batteries. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
There are intriguing possibilities in flywheel-based UPSes Converting electrical energy to mechanical energy & back to electrical energy generally isn't as efficient as electircal-chemical-electrical. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.