recent woes |
![]() |
| log in |
Message boards : Technical News : recent woes
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
| Author | Message |
|---|---|
|
Jeff, thanks for the news! | |
| ID: 1039266 · | |
|
In my experience with overheating and main intensive DB server, hardware will be probably totaly dead in a year (first RAM, then disk and at the end a motherboard), at he same time replica server (same hardware, in the same rack) was alive, and alive, ... and good spare for the main server :-) | |
| ID: 1039269 · | |
|
I'd like more units but I'm grateful for the few I'm getting. You guys keep up the good work and I'll send another case...heh...Thanks! | |
| ID: 1039284 · | |
|
Thanks for the news Jeff...I'm here for the Science...I'll still be here when needed. | |
| ID: 1039299 · | |
|
Thanks for the update! | |
| ID: 1039301 · | |
I have tasks ready to upload that will expire in 6 days! I expect the admins will try and clear the backlog of uploading & reporting tasks before turning on the validator(s). As long as a given result is found when the parent WU goes up for validation, having been reported late makes no difference. To put it another way, the “deadline” can be understood, for all practical purposes, as the earliest moment that a task will be liable to validation—rather than automatic rejection. It’s even possible for a result to be accepted after missing a validator pass: if the validation is unsuccessful (whether due to errors or other missing results) or was inconclusive, resulting in a ‘resend’, it effectively gets a deadline extension to match the replacement tasks. Anyway, the short version is that I wouldn’t give up on any work until I saw that the corresponding WUs had been validated without it. ____________ | |
| ID: 1039303 · | |
|
The server status page is mostly green again, yay! | |
| ID: 1039370 · | |
In my experience with overheating and main intensive DB server, hardware will be probably totaly dead in a year (first RAM, then disk and at the end a motherboard), at he same time replica server (same hardware, in the same rack) was alive, and alive, ... and good spare for the main server :-) BS We had a failure of the AC in our main data center over a year ago (may 2009)while evrything was in full usage during bussiness hours. Temprature inside the cabinets exceeded 48 degrees (we are measuring in Celcius) and many servers shut down on the overheatprotection. We discoverd that we only had one casualty when we brought evrything up again. In the months following no other machines failed. If the event had any noticable impact on the whole server park it could be a slight increase in drive failures. on the other hand this could be aswel due to the age of the machines so no hard evidence. On 300 peices of hardware ther has not been a single RAM or MB failure. ____________ | |
| ID: 1039454 · | |
Don't call 'BS' in this forum. Whatever the source of the hardware problems the project is experiencing, they are real. Caused by the AC failure or not. I personally have had rigs die from overheating. It does happen. So your personal 'claimed' experience does not mean that current Seti problems might not have been caused in part by the the AC failure. It certainly did not enhance the reliability of any of the servers in the closet. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1039456 · | |
|
48C? I *WISH* my laptop would run at 48C... It's currently crunching at 68-70C. | |
| ID: 1039457 · | |
48C? I *WISH* my laptop would run at 48C... It's currently crunching at 68-70C. Laptops are, well, laptops. They have always been constrained by cooling problems. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1039458 · | |
Laptops are, well, laptops. I understand that, but to hear 48C described as an overheat experience... That's near chilly for a computer in my experience. ____________ | |
| ID: 1039462 · | |
48C? I *WISH* my laptop would run at 48C... It's currently crunching at 68-70C. Yeah, but stick your laptop in a cupboard that's at 48C and see what happens. Raise the ambient temp by 20C, and you pretty much raise the component temp by 20C. Hopefully things shut down, or else something breaks down, often in unpredictable ways. The system board or power supply capacitors are a good example, high temps speed up their ageing, and they may not just die. They can just loose capacitance and make the machine hang at random. It's certainly possible that the heat treatment has prematurely aged a motherboard in one of the servers. Before the cookup it was just within spec, now it's just outside and weird things happen. Ian | |
| ID: 1039466 · | |
Temprature inside the cabinets exceeded 48 degrees (we are measuring in Celcius) and many servers shut down on the overheatprotection. When I understand him correctly, he did not say that the servers where running at 48 degrees, but that it was the "roomtemperature" in the cabinets. :D ____________ | |
| ID: 1039482 · | |
|
I have to say that for those of us who only have computers at work, where there is a shut down of everything over the weekend, the fact that the weekly outage always falls in the middle of the week means a very reduced window to upload/download. I have 3 big Astropulses that my machine hammered in a great time and finished on Monday evening but they may not report until next Monday. | |
| ID: 1039484 · | |
|
Okay my 2c worth.... Sometimes shutting down is the worst thing you can do to a system that is suppose to be available 24x7. Exactly. Boinc was introduced on the basis of "donating your idle computer time to science projects". Note idle computer time, it was never originally envisaged that power crunchers would want to run it 24/7. However most projects are actually up 24/7 and try to maintain that, but it has never been an agreed part of the offering. That's appalling That's a bit of a harsh comment there. The guys are doing their level best with minimal funding and old equipment. And they also have their own life and families to be part of as well. ____________ Damsel Rescuer, Kitty Patron, Raccoon Friend, Uli Fan, Julie Supporter, ES99 Admirer, PETA Member, 1st Childhood | |
| ID: 1039502 · | |
|
.... I have a question. I've just looked at the server status page and in the list is db_purge.x86_64. Now, I've also noticed that during the downtimes when the upload / download servers are offline, unlike the other stat lines "Workunits waiting for db purging" and "Results waiting for db purging" never seem to zero out. I'm curious as to why this is. Surely it would make sense that if the db_purge server is up, that the purging zero's out with no new records being added to the queue. | |
| ID: 1039510 · | |
|
This entry on the Seti home page is making me smile: | |
| ID: 1039512 · | |
I'm sure you already know this, and the equipment you're using is very likely much better than what I've used, but overheating problems followed by "hangs" might be caused by bad motherboard capacitors. Check the tops of the caps to see if they're expanded or open and leaking--if they are the motherboard is a goner. Sorry if this is obvious to you folks, but figured I would throw this in since I've run across it in the past. Good luck. Hi Richard, welcome to the forums. I expressed a similar theory in 2 posts in the thread http://setiathome.berkeley.edu/forum_thread.php?id=56160&nowrap=true#959975 in November last year, and again on 1 January this year. I have personally seen this problem in recent years, on a friend's Athlon XP system. It happened first when plugging or unplugging USB devices caused a system hang, with Windows BSOD. A few weeks later the Blue Screens got more frequent, and an examination of the motherboard revealed about 50% of the capacitors had bulging or brown stained tops. While I am sure that server grade hardware should be built to higher quality standards, and better tolerances than ordinary consumer equipment, I think some of the kit that SETI@home uses is pre-production or prototype. Add to that the fact that the ageing of the capacitors may have been accelerated by recent overheating. I hope Matt, Jeff and Eric may find this post helpful, I would certainly put flaky capacitors near the top of my suspect list. I have just done a Google search on "bulging capacitors" that produced lots of results, including many images. Keith. | |
| ID: 1039538 · | |
I have tasks ready to upload that will expire in 6 days! Thanks for that info George ____________ | |
| ID: 1039578 · | |
Message boards : Technical News : recent woes
| Copyright © 2013 University of California |