Message boards :
Technical News :
Moribund Monday (Apr 14 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Continuing problems with the workunit storage server... There were more resets over the weekend, ultimately resulting in one that caused the server to think enough drives have failed to call the entire RAID dead. We are confident we can trick the server into thinking otherwise - we actually have some helpful techs logged in doing that as I type. We still want to replace the whole box, which we'll hopefully do today, and then the drives will have to resync again. Chances are we'll be down until tomorrow (Tuesday). So while we are down we'll try to catch up on several things. Moving servers around the closet, incorporating the new drive enclosure that arrived today, getting more stuff on the new KVM, etc. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
SATAN Send message Joined: 27 Aug 06 Posts: 835 Credit: 2,129,006 RAC: 0 |
Thanks for the update Matt, we know you do as much as you can. |
JPP Send message Joined: 31 May 99 Posts: 18 Credit: 59,436,360 RAC: 47 |
hi perhaps you also *may* wish to review the "work unit allocation" algorythm my pc's are starving ! when servers were still up ; I did not had a chance to receive new /fresh units since my pc were not asking and then when i start asking, servers are down... so i wish to mention that is the first time i can recall ; since 1999; where my favourite pc got nothing to work anymore ; a bit weird indeed of course i run the latest sw load / perhaps you should allow more workunits to be requested by clients ? i m a bit confused cheers jeanpierr€@jpp |
Sagittarius Send message Joined: 3 Jan 08 Posts: 10 Credit: 90,431 RAC: 0 |
Hi Matt, just wonderin'. If you get it up and running by tomorrow AM, any chance of foregoing or delaying the dreaded maintenance day until Wednesday so we can all load up on WU's? At least we'd all be working and not sitting idle another whole day ;) Cheers |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Hi Matt, just wonderin'. If you get it up and running by tomorrow AM, any chance of foregoing or delaying the dreaded maintenance day until Wednesday so we can all load up on WU's? At least we'd all be working and not sitting idle another whole day ;) Maybe a good time to check your host's as well, defragmenting disk's, cleaning the registry, removing never used programs, virus/spyware-scan, getting e-mail, etc. etc. Vacuum cleaning your fans & coolers ;) Mylady says, get rid off the cables ?@#$% |
AndyW Send message Joined: 23 Oct 02 Posts: 5862 Credit: 10,957,677 RAC: 18 |
|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
The Adaptec guys just left - the switchover to the new server looks like a complete success. Plus they coughed up an extra 2GB RAM for the new server while they were here - though that won't show up as a performance boost until the next rev of the OS. So the RAIDs are all resync'ing again now, but we should be good to go by tomorrow morning. I'd like to do the BOINC database reorg/backup on Tuesday like we usually do, but I'll try to get here early and get that out of the way while we're still down. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
The Adaptec guys just left - the switchover to the new server looks like a complete success. Plus they coughed up an extra 2GB RAM for the new server while they were here - though that won't show up as a performance boost until the next rev of the OS. Good news there. Thank you for the extra effort! |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Thanks to Each of You @ Berkeley for All that You are Doing @ Matt - as usual - Thanks for the Updates - It is Appreciated Sir! BOINC Wiki . . . Science Status Page . . . |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Continuing problems with the workunit storage server... There were more resets over the weekend, ultimately resulting in one that caused the server to think enough drives have failed to call the entire RAID dead. We are confident we can trick the server into thinking otherwise - we actually have some helpful techs logged in doing that as I type. We still want to replace the whole box, which we'll hopefully do today, and then the drives will have to resync again. Chances are we'll be down until tomorrow (Tuesday). Just out of curiosity, it is wise to let clients get more work but without downloading the data files? What happens when the download server comes online and everybody tries to download the missing files (hours or days later)? Would it be better for the scheduler to respond "no work from project" until the download servers are back up? If not, when why not? |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Continuing problems with the workunit storage server... There were more resets over the weekend, ultimately resulting in one that caused the server to think enough drives have failed to call the entire RAID dead. We are confident we can trick the server into thinking otherwise - we actually have some helpful techs logged in doing that as I type. We still want to replace the whole box, which we'll hopefully do today, and then the drives will have to resync again. Chances are we'll be down until tomorrow (Tuesday). While the database cleanup and backup is going on, the download and upload server is normally still running. This allows the clients to download and upload files as needed, but does not allow the uploaded results to be reported until the cleanup and backup completes. Therefore, if we have clients getting assigned work units today, they can be ready to be downloaded tommorrow while the database is down. The administrators once shut down the upload/download server during database cleanups and backups, hoping that the absence of upload/download activity would speed up the downtime. However, the post-downtime crunch was awful. When they left the upload/download server active during the downtime, this only caused a slight slowdown but allowed the post-downtime crunch to finish up much quicker, because more packets going through the router during the post-downtime crunches were scheduler requests, their responses, and downloads instead of uploads, therefore removing a sizable load off of the then-overloaded router during crunchtime. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Thanx again for the continued updates Matt. Sorry that you have had so many triala as of late.....hope the replacement download server solves that issue at least..... Chin up, my man. Your efforts are not unnoticed or unappreciated. Regards, Mark. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
cholupa3 Send message Joined: 13 Jan 08 Posts: 1 Credit: 556,366 RAC: 0 |
It seems to have been a while since the last post, and I'm still having difficulty getting WUs. I was hoping that someone could post regarding their own situation, or on the success/failure/delay of the necessary upgrades/repairs. I just want to see if others are having any success, or if it's still a problem on my end. I know you guys are working hard so thank you all for allowing us to participate in SETI. -Eric AKA Cholupa |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
It seems to have been a while since the last post, and I'm still having difficulty getting WUs. I was hoping that someone could post regarding their own situation, or on the success/failure/delay of the necessary upgrades/repairs. I just want to see if others are having any success, or if it's still a problem on my end. I know you guys are working hard so thank you all for allowing us to participate in SETI. This page will tell you when the WU's start flowing again. As you can see from the graph there have been no WU's out for more than 24 hours. When the servers come back online, I expect there will be very heavy traffic for several hours, so if you run out of SETI work you may need a backup project at a small resource share. I ran out of SETI work last night (have 2 WU's stuck downloading) but my main PC still has work for 6 other projects. [edit]Other BOINC projects[/edit] Sir Arthur C Clarke 1917-2008 |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
It seems to have been a while since the last post, and I'm still having difficulty getting WUs. I was hoping that someone could post regarding their own situation, or on the success/failure/delay of the necessary upgrades/repairs. I just want to see if others are having any success, or if it's still a problem on my end. I know you guys are working hard so thank you all for allowing us to participate in SETI. Your post was just after 6:00am in Berkeley. Since Matt said the server was fixed, but would need time to sync., I wouldn't expect it to be up until after they get in this morning and have a chance to check everything out.... |
Mentor397 Send message Joined: 16 May 99 Posts: 25 Credit: 6,794,344 RAC: 108 |
I finally got around to checking the computer. I just wanted to say that you guys are doing a fantastic job in spite of enormous difficulties. - Jim |
Daniel Michel Send message Joined: 2 Feb 04 Posts: 14925 Credit: 1,378,607 RAC: 6 |
I hope the DB backup goes well today...And that means No Nasty Surprises for you guys...Good luck! PROUD TO BE TFFE! |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I'm surprised to see that our friend, the "Reverend" hasn't been around to complain about this recent outage. He always insisted that it was his job to let the SETI team know when they aren't doing theirs. |
Warden Dios Send message Joined: 28 May 99 Posts: 1 Credit: 124,557 RAC: 0 |
I'm happy to say six work units have downloaded within the last hour or so, and mine is running fine now. I'm wondering if there's an option to take more units for pending processing, since my system gets through them reasonably quickly. -W.D. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I'm happy to say six work units have downloaded within You can always increase your cache via your preferences in your account. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.