| Author |
Message |
Matt LebofskyVolunteer moderator Project administrator Project developer Project scientist
 Send message
Joined: 1 Mar 99 Posts: 1379 Credit: 74,079 RAC: 0

|
|
So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.
Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.
This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.
Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).
So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.
- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
|
|
|
|
|
[snip]
As I write this missive gowron is still resyncing and people are connecting and getting work just fine.
- Matt
Great Job Matt!
WOW ! working with networks , Server , Codes , and ..OS's ... I guess ... ya gotta be an eternal optimist :-)
thanks for all your hard work and long hours !
now get some Sleep :-)
Best Wishes
Byron
|
|
|
|
|
So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.
Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.
This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.
Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).
So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.
- Matt
Nice One Matt - Hope You get the Sleep you Deserve and a Great Big CONGRATULATIONS is in Order Sir . . .
|
|
|
|
|
|
As already said ... nice one Matt, and thanks for spending time giving us an update.
Work is flowing in and out from here fine now, as you suggest.
Keep fingers crossed for you that you come back and all is working, and sweetness and light returns!
____________
It's good to be back amongst friends and colleagues
|
|
|
|
|
|
Congrats, Matt...
As one who has just two days ago had his first lesson in Linux, I have only a small appreciation of what you have achieved.
I take my hat off to you.
____________
|
|
|
|
|
|
Perfect, Thank You.
____________
|
|
|
|
|
|
Yeah well done Matt, You deserve to sleep some, Gettin PCs workin is hard enough, But those sound like fun. ;)
____________
BSG Anthem
My Facebook page
|
|
|
|
|
|
Until you've opened a can of worms, you have no idea that there are worms in the can ... ignorance is never a good excuse
Keep up the good work.
____________
Chas - Orme's Tun, Mercia, Albion.
|
|
|
|
|
So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.
Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.
This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.
Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).
So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.
- Matt
Thanks for your hard work Matt...
I also get a chukle out of the names you give the servers. Naming one after a Klingon is a hoot. I name my hard drives, usually, after cartoon characters.
|
|
|
|
|
I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.
Good jog Matt!
On the lighter note, now we who is tripping in the power cords in the server closet. We were always blaming the wrong people: the janitor, the butler, Misfit... Just kiddin' ;)
____________
|
|
|
|
|
|
Hi Matt--
Thanks for all the good work. I have been crunching for almost seven years now and I have seen a lot of volunteers come and go. as you will note from my stats I do not post that much. Basically only when I have a problem and have been unable to work it out myself. I have noticed that the servers on the Beta test site are showing as not running, but there is plenty of work for everyone. Is this a software problem or is it that you have moved the Beta test to other servers?
I also have another question that I have been trying to find an answer to. If you could email me I would like to find out who can help me with it. My email is rkinkead at charter dot net, I will explain the question or problem in the reply.
As always keep up the good work.
R. Kinkead
This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.
Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).
So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.
- Matt
____________
What you do today you will have to live with tonight |
|
|
|
|
|
Matt, It takes unusual dedication it keep this project running, If something as simple as a bump of a switch caused a few hours of down time, SO WHAT! For having the guts to admit it and the fortitude to solve the problem(s),
Thank You.
Jim
So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.
Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.
This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.
Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).
So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.
- Matt
____________
|
|
|
|
|
So great. I went home dejected and hating my job.
So that explains the email I got saying it was all my fault. ;)
____________
|
|
|
BlurfVolunteer tester
 Send message
Joined: 2 Sep 06 Posts: 6467 Credit: 5,735,853 RAC: 2,743

|
|
Misfit...
My dear friend....let me educate you on one of the undeniable facts of life.
If my wheelchair gets stuck in the snow, or the earth suddenly spins out of control losing all gravitational force and the entire population croaks in one sudden gasp....
It is all your fault.... :)
Happy Thursday!
____________
|
|
|
|
|
and the entire population croaks in one sudden gasp....
Does that include me?
____________
|
|
|
|
|
|
well, no, cus someone has to stay behind to put there imput on the SETI boards... |
|
|
|
|
well, no, cus someone has to stay behind to put there imput on the SETI boards...
Well folks, you have it here in writing. I get the last word. I shall consider this a legally binding contract.
So let it be written.
So let it be done.
____________
|
|
|
|
|
|
thank god your not coming with us! :D |
|
|
|
|
thank god your not coming with us! :D
Left behind?
____________
|
|
|
|
|
|
Speaking of naming systems.. The early days of the 'net, I was working for a ISP and we were setting up a POP at a small town, complete with a pair of DNS servers. I got a wild hair and named DNS1 itchy.2fords.net and DNS2 scratchy.2fords.net. I remoted into the main DNS host and changed pointers there and it was seamless.. The owners were greatly amused. |
|
|