Testing the Plumbing (Nov 12 2009)


log in

Advanced search

Message boards : Technical News : Testing the Plumbing (Nov 12 2009)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 946803 - Posted: 13 Nov 2009, 0:01:13 UTC

Turns out the replica recovery was much faster than expected on Tuesday, so I was able to get that on line before the day was out. Then we had the day off yesterday, and now today. Let's see. Seems like I've been lost in testing land today. First, we finally decided on a method to fix the corruption in our Astropulse signal table. It's just one row that needs to be deleted, but we can just delete it using sql - we have to dump the entire database fragment (containing 25% of all the ap signals) and reload it without the one bad row. I wrote a program to test the data flowing in and out of this plumbing to make sure all the funny blob columns remain intact during the procedure. Bob also sleuthed out that this particular corruption actually happened months ago, not during this last RAID hiccup. Fine. Second, I'm also working on a suite of more robust tests/etc. for the software radar blanked results, now that we're getting lots of them.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile LiliKrist
Volunteer tester
Avatar
Send message
Joined: 12 Aug 09
Posts: 333
Credit: 143,167
RAC: 0
Indonesia
Message 946814 - Posted: 13 Nov 2009, 0:44:08 UTC

Thank you for the update news Master Matt =)
____________


N = R x fp x ne x fl x fi x fc x L

BMgoau
Send message
Joined: 8 Jan 07
Posts: 29
Credit: 1,541,301
RAC: 0
Australia
Message 946815 - Posted: 13 Nov 2009, 0:44:31 UTC - in response to Message 946803.

I don't really comment here very often, but I have been following the technical news for years.

I've noticed in the last two or so, with increasing regularity that SETI@home seems to be suffering from "creep". Be it management creep, scope creep or mission creep. Take your pick:

http://en.wikipedia.org/wiki/Creep_%28project_management%29
http://en.wikipedia.org/wiki/Mission_creep
http://en.wikipedia.org/wiki/Scope_creep

It seems like the projects only goal is to keep the data flowing. To perpetuate itself without end. The goal is still clear: find life, but i think the project has become bogged down in the methodology.

You're a legend Matt, your tireless work and effort are wonderful, and i appreciate you always keeping us up to date. Its not something you have to do, but i'm sure we all very much appreciate it. As you mentioned, you have been working on radar blanking, and ntpckr and achieved some wonderful results.

But I imagine how much more might be achieved if the project (the @home part) was shut down for say 12 months so all these continual bugs and things like radar blanking could be worked out without the overhanging need to keep the data pipeline flowing.

You have suggested this once before, and i think it would be a great idea. The project could be consolidated, redefined and smoothed out into something more effective and manageable so more time can be spent on analysing the data, rather than just ensuring it flows.

Just my 2c :)

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24516
Credit: 521,097
RAC: 60
United States
Message 946829 - Posted: 13 Nov 2009, 2:34:02 UTC - in response to Message 946815.

I don't really comment here very often, but I have been following the technical news for years.

I've noticed in the last two or so, with increasing regularity that SETI@home seems to be suffering from "creep". Be it management creep, scope creep or mission creep. Take your pick:

http://en.wikipedia.org/wiki/Creep_%28project_management%29
http://en.wikipedia.org/wiki/Mission_creep
http://en.wikipedia.org/wiki/Scope_creep

It seems like the projects only goal is to keep the data flowing. To perpetuate itself without end. The goal is still clear: find life, but i think the project has become bogged down in the methodology.

You're a legend Matt, your tireless work and effort are wonderful, and i appreciate you always keeping us up to date. Its not something you have to do, but i'm sure we all very much appreciate it. As you mentioned, you have been working on radar blanking, and ntpckr and achieved some wonderful results.

But I imagine how much more might be achieved if the project (the @home part) was shut down for say 12 months so all these continual bugs and things like radar blanking could be worked out without the overhanging need to keep the data pipeline flowing.

You have suggested this once before, and i think it would be a great idea. The project could be consolidated, redefined and smoothed out into something more effective and manageable so more time can be spent on analysing the data, rather than just ensuring it flows.

Just my 2c :)

We have had this discussion before.

The bugs that show up are mostly because of the high load on the servers. If you remove the stress by not handing out work, the bugs disappear as well.
____________


BOINC WIKI

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 946838 - Posted: 13 Nov 2009, 3:13:17 UTC - in response to Message 946815.

Just my 2c :)

As JM7 pointed out, if you stop distributing data, the load goes down and all of the "bugs" you mention vanish and the servers run smoothly at zero load.

More to the point, they would no longer have an environment where they can test the problems caused by such high loading.

But that isn't the problem, and taking a year off won't solve the problem.

The problem is: too many users think that the problems are show-stoppers -- that they're issues that must be fixed right now! or the project is doomed.

IMO, the project doesn't need "time off" to fix problems, what they need is a bit of better hardware (new stuff that isn't hand-me-down) and a little more staff.

Unfortunately, two cents won't buy that.

I think the idea that this is caused by "creep" is simply incorrect -- the goals are the same, but the budget doesn't allow things to be done as quickly as anyone would like.
____________

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,162,635
RAC: 3,826
United States
Message 946917 - Posted: 13 Nov 2009, 14:07:14 UTC

I think BMgoau isn't entirely wrong, yet some of us do treat this project as though it were something more than it is.

One mis-assumption most of us have is that this project is uniquely about finding ET. That most certainly is not what it is today, although it is how seti was started. Today, the project has two themes with almost equal purpose: finding ET and developing BOINC. It isn't hard to imagine that having two masters it may be difficult to satisfy either, despite the apparent synergies between them.

Reading this thread, it appears that if data distribution were stopped, the servers would hum along nicely doing nothing and if we go full tilt, which I assume is the case today, then we have intermittent problems. Ok, then wouldn't it make sense to cut back a bit to understand the point at which the stress introduces problems, fix them, and then increase the distribution rate and fix the next layer of problems that show up, and so on? If the current budget and technology prevents solving the problem in that matter, is it efficient to run the project at a higher level? Isn't that kindof like watering your yard during a haboob? Are we wasting resources by running at too high a (distribution) level? Today this question is probably rhetorical, of course.

A professional development project would also embrace a road map methodology, which would detail how to get to certain performance goals, rather than a fire drill one. In the boinc case, if it were to be treated as 'professional', qualities like reliability would have measurable metrics and be used to structure the problem solving and development activities. Another would be the number of active hosts; I for one would like to see a goal of hosting 2x our current number of 300K active hosts within reasonable and published reliability criteria. As a counter example, the budget should not be used as an excuse, but be used as another boundary condition; few budgets are boundless, so whining about seti's is not being productive.

Another project management tool to use to demonstrate progress and achievement is the breaking of a complex project into subprojects. Seti seems to be an endless stream of calculation and filling a table that is getting more and more unwieldy. I wish they could package seti in this way, so we can say we are doing more than that, and might be a way to increase the active user base more rapidly.

The big variable in all this is probably the source of data to analyze. This fact may indeed limit the seti side of the equation, at least, or alternatively be used to justify an alternative source of data to analyze.

Ok, I'm rambling. [/ramblin]

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 946932 - Posted: 13 Nov 2009, 15:32:01 UTC - in response to Message 946917.

Reading this thread, it appears that if data distribution were stopped, the servers would hum along nicely doing nothing and if we go full tilt, which I assume is the case today, then we have intermittent problems. Ok, then wouldn't it make sense to cut back a bit to understand the point at which the stress introduces problems, fix them, and then increase the distribution rate and fix the next layer of problems that show up, and so on? If the current budget and technology prevents solving the problem in that matter, is it efficient to run the project at a higher level? Isn't that kindof like watering your yard during a haboob? Are we wasting resources by running at too high a (distribution) level? Today this question is probably rhetorical, of course.

A professional development project would also embrace a road map methodology, which would detail how to get to certain performance goals, rather than a fire drill one. In the boinc case, if it were to be treated as 'professional', qualities like reliability would have measurable metrics and be used to structure the problem solving and development activities. Another would be the number of active hosts; I for one would like to see a goal of hosting 2x our current number of 300K active hosts within reasonable and published reliability criteria. As a counter example, the budget should not be used as an excuse, but be used as another boundary condition; few budgets are boundless, so whining about seti's is not being productive.

Reducing the number of hosts may allow work to upload and download more smoothly, but the goal is not to run smoothly, it is to search for signals in recorded data from the telescope.

To do so on a small budget, they are pushing very high loading compared to the typical E-Commerce standards.

This is all set out in the whitepapers at boinc.berkeley.edu.

I'm sure SETI@Home would like the budget to increase staff, get more bandwidth, buy faster servers, etc.

But you're both missing the most important question: how far can they really push it, and what can they do to better utilize a small, finite resource?

Matt gives us a valuable view into the challenges of running at high sustained levels, and some of us take it as a sign of impending doom instead of charting new territory.
____________

Dr.Argentum
Send message
Joined: 24 Nov 99
Posts: 6
Credit: 3,375,390
RAC: 1,644
Canada
Message 946947 - Posted: 13 Nov 2009, 16:31:56 UTC - in response to Message 946815.

I don't really comment here very often, but I have been following the technical news for years.
.
.

But I imagine how much more might be achieved if the project (the @home part) was shut down for say 12 months so all these continual bugs and things like radar blanking could be worked out without the overhanging need to keep the data pipeline flowing.


I'm much the same, but I want to say that I am impressed with the work that Matt & crew do to keep data flowing, including to us in the forum. I am surprised that there have been very few major interruptions. This is a research project in more areas than one. On the other hand, I now work in a government building and we get interrupted about every quarter.

I have often wondered why the staff has not planned to take the project offline for two to three days to overhaul/reset the system. There have been a few times, particularly in the last year, when I was fully expecting this to happen. That such maintenance is made to fit into the weekly outages is also impressive. (I'm a chemical / metallurgical engineer and plant shut downs are common yearly occurrences.) But then, the start-up after a three day shut-down would swamp the servers for a week after...

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 946986 - Posted: 13 Nov 2009, 18:31:46 UTC - in response to Message 946947.

I have often wondered why the staff has not planned to take the project offline for two to three days to overhaul/reset the system.

Probably because that isn't good science.

To actually diagnose the problem, you want to isolate each variable, and test each one.

You want to make one change at a time.

It's slow, but when you make one change, and the logjam lets go, you know what you changed to do that.

If you "shotgun" the fix (change everything all at once) and it fixes it, you run the risk of the problem coming back someday, and you're in the same spot you were in before.

Besides, a major overhaul can probably be divided into smaller tasks, and taken on one at a time without causing a multi-day outage.

____________

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3374
Credit: 2,071,319
RAC: 2,121
Canada
Message 947066 - Posted: 14 Nov 2009, 1:10:15 UTC
Last modified: 14 Nov 2009, 1:13:47 UTC

I think Ned and John have made some good points.

As I understand it, there are 2 goals here: finding ET, and pushing the limits of distributed computing. Us out here in the peanut gallery expect a good data flow, but that is really only a by-product of the 2 main goals.

The analogy of the annual plant shut down is a good one only if your only goal is to maintain a high average production. A better analogy here might be found in developmental hardware testing: push it, break it, fix it, push harder till it breaks again, repeat as long as the money lasts.

It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough".

Having said all that, many thanks to Matt and crew for all they do.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 947128 - Posted: 14 Nov 2009, 6:23:23 UTC - in response to Message 947066.

It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough".

You used to work for Red Green?

____________

zour
Send message
Joined: 7 Jun 08
Posts: 10
Credit: 369,794
RAC: 0
Germany
Message 947169 - Posted: 14 Nov 2009, 14:04:40 UTC

Without having a clue of the technical aspects of SETI, I have some
rather simple questions. My last workunits are from April and now March 2007.

They have already been processed two years ago, am I right?
Why do this again? Just to simulate SETI is still intact?

What's with the signals of the last weeks, when fresh WU's run out?
Are they recorded and will be available or is the dish not working since?

If so, when will the dish work? Any guesses or estimations are welcome.

Sorry for being so impatient!

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3374
Credit: 2,071,319
RAC: 2,121
Canada
Message 947172 - Posted: 14 Nov 2009, 14:19:46 UTC - in response to Message 947169.

Zour, I think this has been covered before on the Number Crunching forum and here. There is no new data for a few months, but Matt and friends have implemented new radar blanking software, that permits the crunching of old data that either wasn't sent out before or bombed immediately when sent out, because of the radar interference in the data. The radar blanking software pre-processes the data, so us normal people can crunch it in a useful way.

Last I heard, fresh data should start around the end of November.
____________

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3374
Credit: 2,071,319
RAC: 2,121
Canada
Message 947176 - Posted: 14 Nov 2009, 14:23:52 UTC - in response to Message 947128.

It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough".

You used to work for Red Green?


Now that you mention it, he did look a little like Mr. Green. And I looked a little like Harold at the time.

Red Green actually just recycled a lot of old Canadian sayings, I think all his standards were in use long before the show started. I can only think of one that was truly original, "You're only young once, but you can always be immature".
____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46281
Credit: 36,676,765
RAC: 5,190
Message 947274 - Posted: 14 Nov 2009, 20:45:04 UTC - in response to Message 947176.

It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough".

You used to work for Red Green?


Now that you mention it, he did look a little like Mr. Green. And I looked a little like Harold at the time.

Red Green actually just recycled a lot of old Canadian sayings, I think all his standards were in use long before the show started. I can only think of one that was truly original, "You're only young once, but you can always be immature".

So that's why that show sounds like a friend of Mine and yep, He's Canadian, He works nearby and has an American wife and 2 kids(1 His with Her and 1 that I think they adopted or wanted to adopt).
____________
My Facebook, War Commander, 2015

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3374
Credit: 2,071,319
RAC: 2,121
Canada
Message 947291 - Posted: 14 Nov 2009, 21:47:30 UTC - in response to Message 947274.

Well ya see, we all talk like that, eh?
____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46281
Credit: 36,676,765
RAC: 5,190
Message 947320 - Posted: 15 Nov 2009, 0:19:37 UTC - in response to Message 947291.

Well ya see, we all talk like that, eh?

Well It's not like Yer hard to understand or that Yer speaking a foreign tongue, Different would be an Aussie, At least accent wise, But still understandable beyond the slang terms they use, Which I think Canadians and Americans as a whole don't use(using their slang that is). :D
____________
My Facebook, War Commander, 2015

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 947321 - Posted: 15 Nov 2009, 0:21:25 UTC - in response to Message 947169.

They have already been processed two years ago, am I right?

It's best not to assume that recordings are processed in any particular order.

It's more likely now, but at one time work was split from tapes, based on whatever happened to be near the front of the shelf. Most of the time, that was new work, and old work was on tapes at the back of the shelf.

SETI@Home said they would not reprocess work just to keep us "busy" -- there are other worthy projects if they have a long dry-spell.

____________

zour
Send message
Joined: 7 Jun 08
Posts: 10
Credit: 369,794
RAC: 0
Germany
Message 964446 - Posted: 18 Jan 2010, 21:28:00 UTC - in response to Message 947321.

Can you recommend another project you find worth of supporting also? I have no idea where to start.

1 · 2 · Next

Message boards : Technical News : Testing the Plumbing (Nov 12 2009)

Copyright © 2014 University of California