Message boards :
Technical News :
Heat Wave (Jun 09 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Over the weekend the scheduler ceased operations on its own again. I was able to remotely fix this Saturday morning and recovery was swift. This was the same problem as earlier in the week but this time we had a smoking gun: the CGI output log file was maxed out at 2GB in size (this is running on a 32 bit system). Cleaning out the logs solved the problem. The thing is: We've been letting these logs grown to 2GB in size for months without any issue. So why is this a problem all of a sudden? However strange, I put a log rotation script in place to prevent this from happening again any time soon. Funny side note: I would have gotten the alerts faster but coincidentally the lab-wide mail servers conked out as well Saturday morning. Other than that, nothing much to report the past couple of days. Which brings us to today. Around 12:30 our server closet air conditioning unit died. Within 30 minutes all the servers warmed up over 5 degrees Celsius and I started getting alerts. This may be a significant problem (i.e. we may need more than just a coolant refill). So depending on how fast we can get the maintenance people up here I might have to shut down parts or all of the project to prevent server burnout. Meanwhile, I have the server closet doors open to help cool things down, much to the annoyance of all the projects on this floor (the fan noise is about 20-30 decibels louder with the doors open). The poor people across the hall from the closet are being defeaned - my desk is a few doors down. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . oh mi lord - hopefully NOTHIN' else shall go wrong eh > you're doin' a great job Matt - iT is appreciated . . . < goes for all of you @ Berkeley btw - Thanks to each of you BOINC Wiki . . . Science Status Page . . . |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Give out a round of earplugs to the folks near that open door. Alone the gesture will make them calm down, hopefully. _\|/_ U r s |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Update: The air conditioner people came up from campus and inspected everything - long story short a faulty switch caused the outside fans to turn off. This switch is now temporarily bypassed until they can replace it. Meanwhile it's running, cold air is coming in, the doors are closed, the hall is quiet again, everybody is happy. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Neil Blaikie Send message Joined: 17 May 99 Posts: 143 Credit: 6,652,341 RAC: 0 |
Good to see the technicians got a temporary fix working for you guys. Here in Montreal, having problems with heat as well, it has been warm all weekend and high humidity levels! Looking at the weather before checking for severe thunderstorm warnings, I noticed there is snow forecast for the Cascades for the next few days down to 2000ft. Maybe time to add climate prediction as a backup project, that is messed up for June! Keep up the good work and thanks for the updates Matt. |
Mad Max Send message Joined: 16 Mar 00 Posts: 475 Credit: 213,231,775 RAC: 407 |
What I find humorous about this is the fact of how much it mirrors my own work life. IAS - Where Space Is Golden! |
Steve Dodd Send message Joined: 29 May 99 Posts: 23 Credit: 8,695,373 RAC: 1 |
Aren't coincidences amazing. On the 1st, Lattice was down for cooling problems. Then on Saturday last, Docking was down for cooling, too. Me thinks there is some code lurking way down in BOINC that must be triggered somehow to cause this to randomly selected projects :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.