Unusual suspects (Feb 06 2007)

Message boards : Technical News : Unusual suspects (Feb 06 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 514477 - Posted: 7 Feb 2007, 7:18:21 UTC

So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.

Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.

This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.

Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).

So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 514477 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 514484 - Posted: 7 Feb 2007, 7:34:38 UTC - in response to Message 514477.  
Last modified: 7 Feb 2007, 8:21:53 UTC





[snip]

As I write this missive gowron is still resyncing and people are connecting and getting work just fine.

- Matt



Great Job Matt!

WOW ! working with networks , Server , Codes , and ..OS's ... I guess ... ya gotta be an eternal optimist :-)

thanks for all your hard work and long hours !

now get some Sleep :-)

Best Wishes
Byron


ID: 514484 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 514498 - Posted: 7 Feb 2007, 9:08:41 UTC - in response to Message 514477.  

So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.

Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.

This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.

Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).

So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.

- Matt


Nice One Matt - Hope You get the Sleep you Deserve and a Great Big CONGRATULATIONS is in Order Sir . . .

ID: 514498 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 514502 - Posted: 7 Feb 2007, 9:31:52 UTC

As already said ... nice one Matt, and thanks for spending time giving us an update.

Work is flowing in and out from here fine now, as you suggest.

Keep fingers crossed for you that you come back and all is working, and sweetness and light returns!
It's good to be back amongst friends and colleagues



ID: 514502 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 514552 - Posted: 7 Feb 2007, 12:40:14 UTC

Congrats, Matt...
As one who has just two days ago had his first lesson in Linux, I have only a small appreciation of what you have achieved.
I take my hat off to you.
ID: 514552 · Report as offensive
Profile Melissa Treu

Send message
Joined: 13 Jan 06
Posts: 1
Credit: 50,107
RAC: 0
United States
Message 514583 - Posted: 7 Feb 2007, 14:50:11 UTC

Perfect, Thank You.
ID: 514583 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 514587 - Posted: 7 Feb 2007, 15:04:05 UTC

Yeah well done Matt, You deserve to sleep some, Gettin PCs workin is hard enough, But those sound like fun. ;)
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 514587 · Report as offensive
Profile Chas Woodhams
Avatar

Send message
Joined: 28 Jun 99
Posts: 64
Credit: 293,207
RAC: 0
United Kingdom
Message 514588 - Posted: 7 Feb 2007, 15:04:48 UTC



Until you've opened a can of worms, you have no idea that there are worms in the can ... ignorance is never a good excuse

Keep up the good work.
Chas - Orme's Tun, Mercia, Albion.

ID: 514588 · Report as offensive
Profile Dennis Lathem
Avatar

Send message
Joined: 3 Dec 06
Posts: 27
Credit: 1,126,010
RAC: 0
United States
Message 514599 - Posted: 7 Feb 2007, 15:59:37 UTC - in response to Message 514477.  

So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.

Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.

This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.

Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).

So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.

- Matt


Thanks for your hard work Matt...

I also get a chukle out of the names you give the servers. Naming one after a Klingon is a hoot. I name my hard drives, usually, after cartoon characters.
ID: 514599 · Report as offensive
Wander Saito
Volunteer tester

Send message
Joined: 7 Jul 03
Posts: 555
Credit: 2,136,061
RAC: 0
Brazil
Message 514629 - Posted: 7 Feb 2007, 17:31:59 UTC - in response to Message 514477.  

I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.


Good jog Matt!

On the lighter note, now we who is tripping in the power cords in the server closet. We were always blaming the wrong people: the janitor, the butler, Misfit... Just kiddin' ;)

ID: 514629 · Report as offensive
Profile Stealth Eagle*
Volunteer tester
Avatar

Send message
Joined: 7 Sep 00
Posts: 5971
Credit: 367,640
RAC: 0
United States
Message 514757 - Posted: 8 Feb 2007, 1:25:04 UTC - in response to Message 514477.  

Hi Matt--
Thanks for all the good work. I have been crunching for almost seven years now and I have seen a lot of volunteers come and go. as you will note from my stats I do not post that much. Basically only when I have a problem and have been unable to work it out myself. I have noticed that the servers on the Beta test site are showing as not running, but there is plenty of work for everyone. Is this a software problem or is it that you have moved the Beta test to other servers?

I also have another question that I have been trying to find an answer to. If you could email me I would like to find out who can help me with it. My email is rkinkead at charter dot net, I will explain the question or problem in the reply.

As always keep up the good work.

R. Kinkead

This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.

Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).

So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.

- Matt






What you do today you will have to live with tonight
ID: 514757 · Report as offensive
Profile skywatch@Seti.USA

Send message
Joined: 1 Dec 00
Posts: 22
Credit: 5,085,655
RAC: 0
United States
Message 514767 - Posted: 8 Feb 2007, 2:09:33 UTC - in response to Message 514477.  

Matt, It takes unusual dedication it keep this project running, If something as simple as a bump of a switch caused a few hours of down time, SO WHAT! For having the guts to admit it and the fortitude to solve the problem(s),
Thank You.
Jim
So today was a usual day until the mid afternoon. Eric got a new RAID card (as well as a set of 8 750GB drives) to add to his server ewen, which is strictly a hydrogen survey machine. I helped him pluck the heavy machine from our server racks and place the new drives in trays, etc. The drive trays required unusually small screws, so Eric disappeared for a while hunting around the lab for such things.

Meanwhile, some SETI servers were locking on ewen being off the network. It's a tangled web of network dependencies around here, as you know. And then upon turning the machine on we had to wait a few hours for the thing to build a 4 terabyte RAID array before we could boot the OS and free the stranglehold it had on random machines.

This didn't affect the public projects - it just made it hard to get any work done. But the following was worse. So I'm gearing up to upgrade isaac (the boinc.berkeley.edu server) and was inspecting its empty drive slots when I noticed that gowron (not the download server, but the download *file* server) was rebooting. I must have accidentally grazed against the touch-sensitive power switch right on gowron's front as I was messing with isaac which is right above it in the rack. Well, dammit.

Normally, this would be no big deal, but upon coming back up kryten and penguin (the upload and download servers) weren't given permission to mount it. In short, I uncovered either a bug in gowron's OS or some newly broken configuration, or both. Attempts to set things right required reboots at each step, and one such reboot triggered an entire RAID resync, which normally takes all night (when the project is inactive - several weeks if the project *is* active).

So great. I went home dejected and hating my job. Eventually I checked back in and found the resync of the download partition actually completed, and even though other lesser-used partitions were far from done I found a way to somehow trick gowron into letting kryten and penguin mount its partitions, and voila! The project is back up. As I write this missive gowron is still resyncing and people are connecting and getting work just fine.

- Matt


ID: 514767 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 514835 - Posted: 8 Feb 2007, 4:54:04 UTC - in response to Message 514477.  

So great. I went home dejected and hating my job.

So that explains the email I got saying it was all my fault. ;)
me@rescam.org
ID: 514835 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 515040 - Posted: 8 Feb 2007, 21:21:57 UTC

Misfit...

My dear friend....let me educate you on one of the undeniable facts of life.

If my wheelchair gets stuck in the snow, or the earth suddenly spins out of control losing all gravitational force and the entire population croaks in one sudden gasp....

It is all your fault.... :)

Happy Thursday!


ID: 515040 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 515198 - Posted: 9 Feb 2007, 6:30:45 UTC - in response to Message 515040.  

and the entire population croaks in one sudden gasp....

Does that include me?
me@rescam.org
ID: 515198 · Report as offensive
Profile [SETI.USA]Tank_Master
Volunteer tester
Avatar

Send message
Joined: 1 Jan 01
Posts: 24
Credit: 2,194,285
RAC: 0
United States
Message 515201 - Posted: 9 Feb 2007, 6:33:43 UTC

well, no, cus someone has to stay behind to put there imput on the SETI boards...
ID: 515201 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 515203 - Posted: 9 Feb 2007, 6:38:33 UTC - in response to Message 515201.  

well, no, cus someone has to stay behind to put there imput on the SETI boards...

Well folks, you have it here in writing. I get the last word. I shall consider this a legally binding contract.

So let it be written.
So let it be done.
me@rescam.org
ID: 515203 · Report as offensive
Profile [SETI.USA]Tank_Master
Volunteer tester
Avatar

Send message
Joined: 1 Jan 01
Posts: 24
Credit: 2,194,285
RAC: 0
United States
Message 515206 - Posted: 9 Feb 2007, 6:47:02 UTC

thank god your not coming with us! :D
ID: 515206 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 515565 - Posted: 10 Feb 2007, 5:54:32 UTC - in response to Message 515206.  

thank god your not coming with us! :D

Left behind?
me@rescam.org
ID: 515565 · Report as offensive
Profile NightHawke

Send message
Joined: 14 May 99
Posts: 16
Credit: 2,494,119
RAC: 0
United States
Message 516004 - Posted: 11 Feb 2007, 1:10:22 UTC

Speaking of naming systems.. The early days of the 'net, I was working for a ISP and we were setting up a POP at a small town, complete with a pair of DNS servers. I got a wild hair and named DNS1 itchy.2fords.net and DNS2 scratchy.2fords.net. I remoted into the main DNS host and changed pointers there and it was seamless.. The owners were greatly amused.
ID: 516004 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Unusual suspects (Feb 06 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.