Moribund Monday (Apr 14 2008)

Message boards : Technical News : Moribund Monday (Apr 14 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 739049 - Posted: 14 Apr 2008, 19:03:42 UTC

Continuing problems with the workunit storage server... There were more resets over the weekend, ultimately resulting in one that caused the server to think enough drives have failed to call the entire RAID dead. We are confident we can trick the server into thinking otherwise - we actually have some helpful techs logged in doing that as I type. We still want to replace the whole box, which we'll hopefully do today, and then the drives will have to resync again. Chances are we'll be down until tomorrow (Tuesday).

So while we are down we'll try to catch up on several things. Moving servers around the closet, incorporating the new drive enclosure that arrived today, getting more stuff on the new KVM, etc.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 739049 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 739055 - Posted: 14 Apr 2008, 19:10:49 UTC

Thanks for the update Matt, we know you do as much as you can.
ID: 739055 · Report as offensive
JPP

Send message
Joined: 31 May 99
Posts: 18
Credit: 59,436,360
RAC: 47
France
Message 739068 - Posted: 14 Apr 2008, 19:52:52 UTC - in response to Message 739055.  

hi
perhaps you also *may* wish to review the "work unit allocation" algorythm
my pc's are starving ! when servers were still up ; I did not had a chance to receive new /fresh units since my pc were not asking and then when i start asking, servers are down...
so i wish to mention that is the first time i can recall ; since 1999; where my favourite pc got nothing to work anymore ; a bit weird indeed
of course i run the latest sw load / perhaps you should allow more workunits to be requested by clients ? i m a bit confused
cheers
jeanpierr€@jpp
ID: 739068 · Report as offensive
Sagittarius

Send message
Joined: 3 Jan 08
Posts: 10
Credit: 90,431
RAC: 0
Canada
Message 739091 - Posted: 14 Apr 2008, 20:49:18 UTC

Hi Matt, just wonderin'. If you get it up and running by tomorrow AM, any chance of foregoing or delaying the dreaded maintenance day until Wednesday so we can all load up on WU's? At least we'd all be working and not sitting idle another whole day ;)

Cheers
ID: 739091 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 739106 - Posted: 14 Apr 2008, 21:42:50 UTC - in response to Message 739091.  
Last modified: 14 Apr 2008, 21:47:33 UTC

Hi Matt, just wonderin'. If you get it up and running by tomorrow AM, any chance of foregoing or delaying the dreaded maintenance day until Wednesday so we can all load up on WU's? At least we'd all be working and not sitting idle another whole day ;)

Cheers


Maybe a good time to check your host's as well, defragmenting disk's, cleaning the registry, removing never used programs, virus/spyware-scan, getting e-mail, etc. etc.
Vacuum cleaning your fans & coolers ;)
Mylady says, get rid off the cables ?@#$%
ID: 739106 · Report as offensive
Profile AndyW Project Donor
Volunteer tester
Avatar

Send message
Joined: 23 Oct 02
Posts: 5862
Credit: 10,957,677
RAC: 18
United Kingdom
Message 739124 - Posted: 14 Apr 2008, 22:23:32 UTC

It's a constant battle isn't it?!

Good luck Matt. Just make sure you have it all working by Thursday so I can feed my new quad. Wouldn't like to see it going hungry ;)
ID: 739124 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 739127 - Posted: 14 Apr 2008, 22:27:55 UTC

The Adaptec guys just left - the switchover to the new server looks like a complete success. Plus they coughed up an extra 2GB RAM for the new server while they were here - though that won't show up as a performance boost until the next rev of the OS.

So the RAIDs are all resync'ing again now, but we should be good to go by tomorrow morning.

I'd like to do the BOINC database reorg/backup on Tuesday like we usually do, but I'll try to get here early and get that out of the way while we're still down.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 739127 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 739154 - Posted: 14 Apr 2008, 23:26:17 UTC - in response to Message 739127.  

The Adaptec guys just left - the switchover to the new server looks like a complete success. Plus they coughed up an extra 2GB RAM for the new server while they were here - though that won't show up as a performance boost until the next rev of the OS.

So the RAIDs are all resync'ing again now, but we should be good to go by tomorrow morning.

I'd like to do the BOINC database reorg/backup on Tuesday like we usually do, but I'll try to get here early and get that out of the way while we're still down.

- Matt

Good news there. Thank you for the extra effort!
ID: 739154 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 739170 - Posted: 15 Apr 2008, 0:00:48 UTC


. . . Thanks to Each of You @ Berkeley for All that You are Doing

@ Matt - as usual - Thanks for the Updates - It is Appreciated Sir!




BOINC Wiki . . .

Science Status Page . . .
ID: 739170 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 739229 - Posted: 15 Apr 2008, 2:14:40 UTC - in response to Message 739049.  
Last modified: 15 Apr 2008, 2:15:30 UTC

Continuing problems with the workunit storage server... There were more resets over the weekend, ultimately resulting in one that caused the server to think enough drives have failed to call the entire RAID dead. We are confident we can trick the server into thinking otherwise - we actually have some helpful techs logged in doing that as I type. We still want to replace the whole box, which we'll hopefully do today, and then the drives will have to resync again. Chances are we'll be down until tomorrow (Tuesday).

So while we are down we'll try to catch up on several things. Moving servers around the closet, incorporating the new drive enclosure that arrived today, getting more stuff on the new KVM, etc.

- Matt


Just out of curiosity, it is wise to let clients get more work but without downloading the data files? What happens when the download server comes online and everybody tries to download the missing files (hours or days later)?

Would it be better for the scheduler to respond "no work from project" until the download servers are back up? If not, when why not?
ID: 739229 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 739240 - Posted: 15 Apr 2008, 3:16:21 UTC - in response to Message 739229.  

Continuing problems with the workunit storage server... There were more resets over the weekend, ultimately resulting in one that caused the server to think enough drives have failed to call the entire RAID dead. We are confident we can trick the server into thinking otherwise - we actually have some helpful techs logged in doing that as I type. We still want to replace the whole box, which we'll hopefully do today, and then the drives will have to resync again. Chances are we'll be down until tomorrow (Tuesday).

So while we are down we'll try to catch up on several things. Moving servers around the closet, incorporating the new drive enclosure that arrived today, getting more stuff on the new KVM, etc.

- Matt


Just out of curiosity, it is wise to let clients get more work but without downloading the data files? What happens when the download server comes online and everybody tries to download the missing files (hours or days later)?

Would it be better for the scheduler to respond "no work from project" until the download servers are back up? If not, when why not?

While the database cleanup and backup is going on, the download and upload server is normally still running. This allows the clients to download and upload files as needed, but does not allow the uploaded results to be reported until the cleanup and backup completes. Therefore, if we have clients getting assigned work units today, they can be ready to be downloaded tommorrow while the database is down.

The administrators once shut down the upload/download server during database cleanups and backups, hoping that the absence of upload/download activity would speed up the downtime. However, the post-downtime crunch was awful. When they left the upload/download server active during the downtime, this only caused a slight slowdown but allowed the post-downtime crunch to finish up much quicker, because more packets going through the router during the post-downtime crunches were scheduler requests, their responses, and downloads instead of uploads, therefore removing a sizable load off of the then-overloaded router during crunchtime.
ID: 739240 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 739259 - Posted: 15 Apr 2008, 4:53:39 UTC

Thanx again for the continued updates Matt. Sorry that you have had so many triala as of late.....hope the replacement download server solves that issue at least.....
Chin up, my man. Your efforts are not unnoticed or unappreciated.

Regards,
Mark.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 739259 · Report as offensive
cholupa3

Send message
Joined: 13 Jan 08
Posts: 1
Credit: 556,366
RAC: 0
Message 739366 - Posted: 15 Apr 2008, 13:31:13 UTC

It seems to have been a while since the last post, and I'm still having difficulty getting WUs. I was hoping that someone could post regarding their own situation, or on the success/failure/delay of the necessary upgrades/repairs. I just want to see if others are having any success, or if it's still a problem on my end. I know you guys are working hard so thank you all for allowing us to participate in SETI.

-Eric AKA Cholupa
ID: 739366 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 739385 - Posted: 15 Apr 2008, 14:44:55 UTC - in response to Message 739366.  
Last modified: 15 Apr 2008, 14:48:17 UTC

It seems to have been a while since the last post, and I'm still having difficulty getting WUs. I was hoping that someone could post regarding their own situation, or on the success/failure/delay of the necessary upgrades/repairs. I just want to see if others are having any success, or if it's still a problem on my end. I know you guys are working hard so thank you all for allowing us to participate in SETI.

-Eric AKA Cholupa


This page will tell you when the WU's start flowing again.

As you can see from the graph there have been no WU's out for more than 24 hours.

When the servers come back online, I expect there will be very heavy traffic for several hours, so if you run out of SETI work you may need a backup project at a small resource share.

I ran out of SETI work last night (have 2 WU's stuck downloading) but my main PC still has work for 6 other projects.

[edit]Other BOINC projects[/edit]
Sir Arthur C Clarke 1917-2008
ID: 739385 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 739420 - Posted: 15 Apr 2008, 15:39:41 UTC - in response to Message 739366.  

It seems to have been a while since the last post, and I'm still having difficulty getting WUs. I was hoping that someone could post regarding their own situation, or on the success/failure/delay of the necessary upgrades/repairs. I just want to see if others are having any success, or if it's still a problem on my end. I know you guys are working hard so thank you all for allowing us to participate in SETI.

-Eric AKA Cholupa

Your post was just after 6:00am in Berkeley. Since Matt said the server was fixed, but would need time to sync., I wouldn't expect it to be up until after they get in this morning and have a chance to check everything out....
ID: 739420 · Report as offensive
Profile Mentor397
Avatar

Send message
Joined: 16 May 99
Posts: 25
Credit: 6,794,344
RAC: 108
United States
Message 739421 - Posted: 15 Apr 2008, 15:41:23 UTC

I finally got around to checking the computer. I just wanted to say that you guys are doing a fantastic job in spite of enormous difficulties.

- Jim

ID: 739421 · Report as offensive
Profile Daniel Michel
Volunteer tester
Avatar

Send message
Joined: 2 Feb 04
Posts: 14925
Credit: 1,378,607
RAC: 6
United States
Message 739446 - Posted: 15 Apr 2008, 15:59:29 UTC

I hope the DB backup goes well today...And that means No Nasty Surprises for you guys...Good luck!

PROUD TO BE TFFE!
ID: 739446 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 739504 - Posted: 15 Apr 2008, 21:02:59 UTC

I'm surprised to see that our friend, the "Reverend" hasn't been around to complain about this recent outage. He always insisted that it was his job to let the SETI team know when they aren't doing theirs.
ID: 739504 · Report as offensive
Warden Dios

Send message
Joined: 28 May 99
Posts: 1
Credit: 124,557
RAC: 0
United Kingdom
Message 739543 - Posted: 15 Apr 2008, 22:38:06 UTC - in response to Message 739504.  

I'm happy to say six work units have downloaded within
the last hour or so, and mine is running fine now. I'm
wondering if there's an option to take more units for
pending processing, since my system gets through
them reasonably quickly.

-W.D.
ID: 739543 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 739551 - Posted: 15 Apr 2008, 22:50:09 UTC - in response to Message 739543.  

I'm happy to say six work units have downloaded within
the last hour or so, and mine is running fine now. I'm
wondering if there's an option to take more units for
pending processing, since my system gets through
them reasonably quickly.

-W.D.



You can always increase your cache via your preferences in your account.
ID: 739551 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Moribund Monday (Apr 14 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.