Nirvana for Mice (Jun 12 2007)

Message boards : Technical News : Nirvana for Mice (Jun 12 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 585989 - Posted: 12 Jun 2007, 22:24:53 UTC

Despite our efforts yesterday, BOINC database problems continue. So Jeff and I definitively decided to upgrade jocelyn as much as we could today to become the new master database again. Just a matter of replacing CPU's and adding memory, no?

Well, no. A lot of machines in our rack, for one reason or another, aren't actually racked up but simply placed flat on the server below it. So sitting on top of jocelyn is its 3510 fibre channel disk array. And sitting on top of that is lando (computer server). And sitting on top of that is a monitor/keyboard/mouse hooked up to a KVM switch. So.. we had to move all stuff out of the way first. Kevin had an IDL process running on lando which we had to wait two hours to complete (if we killed it, he would have lots two weeks of work). Then we safely powered everything off and carefully upgraded the various parts of the system. In short, jocelyn used to have two 844s (1.8 GHz opteron processors) but now have four 848s (2.2 GHz opterons). We also bumped up the RAM from 16 GB to 28 GB with memory from various recent donations we couldn't use elsewhere until now.

Hopefully replication will catch up tomorrow and we can swap the relationship of the master/replica databases and that'll generally improve the efficiency of our whole system. Until then...

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 585989 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 586007 - Posted: 12 Jun 2007, 23:08:35 UTC

fingers crossed . . . nice goin' @ Berkeley btw

< Matt - check your PM
BOINC Wiki . . .

Science Status Page . . .
ID: 586007 · Report as offensive
Redshift
Avatar

Send message
Joined: 3 Apr 99
Posts: 122
Credit: 1,244,536
RAC: 0
United States
Message 586077 - Posted: 13 Jun 2007, 1:41:49 UTC - in response to Message 585989.  
Last modified: 13 Jun 2007, 1:48:51 UTC

Hi Matt. Over the years (10 years now) I've worked on many understaffed, under-budgeted projects where we would often end up "finding ourselves frustrated with a random and pointless problem..." (quote from yesterdays post). I've been there! Here are some (hopefully helpful) thoughts that have come to mind as I've read the posts here for the last several months. I've held back from posting as I was hesitant to sound critical. But my only intention is to help. I apologize if someone has already suggested this.

What has helped on projects where I have worked, and what I find more stable projects have in common, is a process that implements change control. Not just because it can minimizes outages. More importantly, in my own words, it lets you go faster by going slower. A project may slow down to do more planning and required process, but in the end often archives the desired goals faster and with less frustration. Minimizing outages may not even be important to a particular project, but a change control process also helps minimize the amount of time people spend fighting fires. This in turn helps everyone stay sane and happy.

I certainly don't know the internal details of your project, but my intuition tells me that your project may benefit from the practice/process as well. Even a one person project can benefit from a self-enforced change control process.

If you are like I was at first, then this sounds revolting! That was how I felt long ago. If I see a change that is needed I'm going to make it. Get it approved, post it for others to review? I'm smart enough myself. Verify that it is consistent with plan or requirements? Too much work! No time, no money, not necessary, I thought. But over time I have seen that projects that implement a change control process actually SAVE time and money over those that do not.

Anyway, the above links may be a good start, but there are entire books on the subject.

You could start as simply as taking the seven step from the wikipedia article, describing in writing how you currently implement each one (or don't). Then after the next problem occurs, review the steps, and see which one let you down, and allowed the problem to happen. Then update your process for implementint that step...in so doing continue to refine your change control process.


Cheers, good luck, and keep up the good work.
ID: 586077 · Report as offensive
Profile edjcox
Avatar

Send message
Joined: 20 May 99
Posts: 96
Credit: 5,878,353
RAC: 0
United States
Message 586148 - Posted: 13 Jun 2007, 5:36:10 UTC

Curious to know why the Seti Science section has not been upated in over a year???

Who is responsbile for getting the community some sense of ther accomplishments and some insight as to the results we are achieving?

Controlling your hardware configuration appears adhoc.. Do you have any type of oversight review board? My 30 years of experience has proven to me time and time again that you often introduce problems into a given system by making changes, especially major ones without thoughtful planning and careful incremental steps.

I am losing confidence in SETI's technical soundness...

What are the contingency plans in the event Aricibo is placed off line and no longer a resuorce for your survey data? Wat other recievers have you considered?

Sorry if I seem critical. Just would like the best for the project overall...

Thank you


Never engage stupid people at their level, they then have the home court advantage.....
ID: 586148 · Report as offensive
Profile ponbiki

Send message
Joined: 9 Feb 04
Posts: 114
Credit: 115,897
RAC: 0
United States
Message 586151 - Posted: 13 Jun 2007, 6:23:30 UTC - in response to Message 586148.  
Last modified: 13 Jun 2007, 6:24:04 UTC

They're not in the position to operate the machines, get time to draft grants, work on projects that actually have money, AND work on the science portion of the site that really doesn't serve any real function other than edification for people. Doesn't seem fair and people complain about it, yes, but that's the breaks. Oversight review board? This is a shoe-string project that requires the staff to actively seek community payment and support, much less enough assets to justify such an adventure.

Contingencies in-case of Arecibo's closure is something that is far-off into the future and will depend solely on the actions of Congress, it can't be dealt with on the local level at Cal. If this was a perfect world, they would have backups to everything. Reality-wise, they're lucky to have spare parts for the servers. I'd rather have them get their stuff up and running and humming as smoothly as possible instead of wasting time and resources searching for a new and viable place to piggy-back expensive equipment. When Arecibo closes, we will have enough data to crunch from ALFA for a few months to a few years, depending on the splitters and the data. Then SETI will wrap up and you can proceed to join other programs (Which you can do already, especially since you're losing faith in the technical soundness of the project already.)

Being critical is always good. Just be a bit more sound in it.


Curious to know why the Seti Science section has not been upated in over a year???

Who is responsbile for getting the community some sense of ther accomplishments and some insight as to the results we are achieving?

Controlling your hardware configuration appears adhoc.. Do you have any type of oversight review board? My 30 years of experience has proven to me time and time again that you often introduce problems into a given system by making changes, especially major ones without thoughtful planning and careful incremental steps.

I am losing confidence in SETI's technical soundness...

What are the contingency plans in the event Aricibo is placed off line and no longer a resuorce for your survey data? Wat other recievers have you considered?

Sorry if I seem critical. Just would like the best for the project overall...

Thank you



ID: 586151 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 586160 - Posted: 13 Jun 2007, 7:06:41 UTC


(4)  Some things in life can never be fully appreciated nor
        understood unless experienced firsthand. Some things in
        networking can never be fully understood by someone who neither
        builds commercial networking equipment nor runs an operational
        network.


From the 12 Networking Truths.

Grant
Darwin NT
ID: 586160 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 586227 - Posted: 13 Jun 2007, 12:17:21 UTC

Another aspect of the various 'changes' and problems of the past few months is that a lot of the cause is beyond Matt's control for the present setup.

The prime problems appear to be too many users swamping old spec servers with an ever increasing volume of data. Throw in also that s@h appears to be the alpha testbed for all of the Boinc developments thrown at it by outside developers and... Throw in a power cut or few and an ISP change or two and... And various hardware failures...

It's a credit to the very few people involved that s@h hangs together as well as it does AND continues to keep a few hundred thousand participants busy.


What changes can be managed I'm sure are well managed, especially for rearranging the server closet.

As to whether the rest could be managed better... I think that could only be done if the number of participants were restricted and the WU rate was restricted, which so far HAS NEVER BEEN DONE.


For an analogy: take a one decade old banger, go onto a German autobahn, and keep the thing screeming flat out. Oh, you've to refuel with nitro and change a wheel or two and service the engine whilst still doing 230kph with the wind ripping your head off!

As to whether you would get further by arranging regular pit stops is a good question. That is already done with the weekly database shutdowns.

Arrange a pit-stop to swap the banger for an F1 car?... Any donations for that?


Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 586227 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 586234 - Posted: 13 Jun 2007, 12:51:40 UTC - in response to Message 586227.  

Another aspect of the various 'changes' and problems of the past few months is that a lot of the cause is beyond Matt's control for the present setup.

The prime problems appear to be too many users swamping old spec servers with an ever increasing volume of data. Throw in also that s@h appears to be the alpha testbed for all of the Boinc developments thrown at it by outside developers and... Throw in a power cut or few and an ISP change or two and... And various hardware failures...

It's a credit to the very few people involved that s@h hangs together as well as it does AND continues to keep a few hundred thousand participants busy.


What changes can be managed I'm sure are well managed, especially for rearranging the server closet.

As to whether the rest could be managed better... I think that could only be done if the number of participants were restricted and the WU rate was restricted, which so far HAS NEVER BEEN DONE.


For an analogy: take a one decade old banger, go onto a German autobahn, and keep the thing screeming flat out. Oh, you've to refuel with nitro and change a wheel or two and service the engine whilst still doing 230kph with the wind ripping your head off!

As to whether you would get further by arranging regular pit stops is a good question. That is already done with the weekly database shutdowns.

Arrange a pit-stop to swap the banger for an F1 car?... Any donations for that?


Happy crunchin',
Martin


Amem to that!

Boinc Button Abuser In Training >My Shrubbers<
ID: 586234 · Report as offensive

Message boards : Technical News : Nirvana for Mice (Jun 12 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.