Testing the Plumbing (Nov 12 2009) |
![]() |
| log in |
Message boards : Technical News : Testing the Plumbing (Nov 12 2009)
1 · 2 · Next
| Author | Message |
|---|---|
|
Turns out the replica recovery was much faster than expected on Tuesday, so I was able to get that on line before the day was out. Then we had the day off yesterday, and now today. Let's see. Seems like I've been lost in testing land today. First, we finally decided on a method to fix the corruption in our Astropulse signal table. It's just one row that needs to be deleted, but we can just delete it using sql - we have to dump the entire database fragment (containing 25% of all the ap signals) and reload it without the one bad row. I wrote a program to test the data flowing in and out of this plumbing to make sure all the funny blob columns remain intact during the procedure. Bob also sleuthed out that this particular corruption actually happened months ago, not during this last RAID hiccup. Fine. Second, I'm also working on a suite of more robust tests/etc. for the software radar blanked results, now that we're getting lots of them. | |
| ID: 946803 · | |
|
Thank you for the update news Master Matt =) | |
| ID: 946814 · | |
|
I don't really comment here very often, but I have been following the technical news for years. | |
| ID: 946815 · | |
I don't really comment here very often, but I have been following the technical news for years. We have had this discussion before. The bugs that show up are mostly because of the high load on the servers. If you remove the stress by not handing out work, the bugs disappear as well. ____________ BOINC WIKI | |
| ID: 946829 · | |
Just my 2c :) As JM7 pointed out, if you stop distributing data, the load goes down and all of the "bugs" you mention vanish and the servers run smoothly at zero load. More to the point, they would no longer have an environment where they can test the problems caused by such high loading. But that isn't the problem, and taking a year off won't solve the problem. The problem is: too many users think that the problems are show-stoppers -- that they're issues that must be fixed right now! or the project is doomed. IMO, the project doesn't need "time off" to fix problems, what they need is a bit of better hardware (new stuff that isn't hand-me-down) and a little more staff. Unfortunately, two cents won't buy that. I think the idea that this is caused by "creep" is simply incorrect -- the goals are the same, but the budget doesn't allow things to be done as quickly as anyone would like. ____________ | |
| ID: 946838 · | |
|
Shutting the project down and doing a restart months later would trash the project's user base and it's reputation. | |
| ID: 946875 · | |
|
I think BMgoau isn't entirely wrong, yet some of us do treat this project as though it were something more than it is. | |
| ID: 946917 · | |
Reading this thread, it appears that if data distribution were stopped, the servers would hum along nicely doing nothing and if we go full tilt, which I assume is the case today, then we have intermittent problems. Ok, then wouldn't it make sense to cut back a bit to understand the point at which the stress introduces problems, fix them, and then increase the distribution rate and fix the next layer of problems that show up, and so on? If the current budget and technology prevents solving the problem in that matter, is it efficient to run the project at a higher level? Isn't that kindof like watering your yard during a haboob? Are we wasting resources by running at too high a (distribution) level? Today this question is probably rhetorical, of course. Reducing the number of hosts may allow work to upload and download more smoothly, but the goal is not to run smoothly, it is to search for signals in recorded data from the telescope. To do so on a small budget, they are pushing very high loading compared to the typical E-Commerce standards. This is all set out in the whitepapers at boinc.berkeley.edu. I'm sure SETI@Home would like the budget to increase staff, get more bandwidth, buy faster servers, etc. But you're both missing the most important question: how far can they really push it, and what can they do to better utilize a small, finite resource? Matt gives us a valuable view into the challenges of running at high sustained levels, and some of us take it as a sign of impending doom instead of charting new territory. ____________ | |
| ID: 946932 · | |
I don't really comment here very often, but I have been following the technical news for years. I'm much the same, but I want to say that I am impressed with the work that Matt & crew do to keep data flowing, including to us in the forum. I am surprised that there have been very few major interruptions. This is a research project in more areas than one. On the other hand, I now work in a government building and we get interrupted about every quarter. I have often wondered why the staff has not planned to take the project offline for two to three days to overhaul/reset the system. There have been a few times, particularly in the last year, when I was fully expecting this to happen. That such maintenance is made to fit into the weekly outages is also impressive. (I'm a chemical / metallurgical engineer and plant shut downs are common yearly occurrences.) But then, the start-up after a three day shut-down would swamp the servers for a week after... | |
| ID: 946947 · | |
I have often wondered why the staff has not planned to take the project offline for two to three days to overhaul/reset the system. Probably because that isn't good science. To actually diagnose the problem, you want to isolate each variable, and test each one. You want to make one change at a time. It's slow, but when you make one change, and the logjam lets go, you know what you changed to do that. If you "shotgun" the fix (change everything all at once) and it fixes it, you run the risk of the problem coming back someday, and you're in the same spot you were in before. Besides, a major overhaul can probably be divided into smaller tasks, and taken on one at a time without causing a multi-day outage. ____________ | |
| ID: 946986 · | |
|
I think Ned and John have made some good points. | |
| ID: 947066 · | |
It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough". You used to work for Red Green? ____________ | |
| ID: 947128 · | |
|
Without having a clue of the technical aspects of SETI, I have some | |
| ID: 947169 · | |
|
Zour, I think this has been covered before on the Number Crunching forum and here. There is no new data for a few months, but Matt and friends have implemented new radar blanking software, that permits the crunching of old data that either wasn't sent out before or bombed immediately when sent out, because of the radar interference in the data. The radar blanking software pre-processes the data, so us normal people can crunch it in a useful way. | |
| ID: 947172 · | |
It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough". Now that you mention it, he did look a little like Mr. Green. And I looked a little like Harold at the time. Red Green actually just recycled a lot of old Canadian sayings, I think all his standards were in use long before the show started. I can only think of one that was truly original, "You're only young once, but you can always be immature". ____________ | |
| ID: 947176 · | |
It reminds me of a quote from one of my first bosses in the testing business - "if you don't break something once in awhile you are not trying hard enough". So that's why that show sounds like a friend of Mine and yep, He's Canadian, He works nearby and has an American wife and 2 kids(1 His with Her and 1 that I think they adopted or wanted to adopt). ____________ BSG Anthem My Facebook page | |
| ID: 947274 · | |
|
Well ya see, we all talk like that, eh? | |
| ID: 947291 · | |
Well ya see, we all talk like that, eh? Well It's not like Yer hard to understand or that Yer speaking a foreign tongue, Different would be an Aussie, At least accent wise, But still understandable beyond the slang terms they use, Which I think Canadians and Americans as a whole don't use(using their slang that is). :D ____________ BSG Anthem My Facebook page | |
| ID: 947320 · | |
They have already been processed two years ago, am I right? It's best not to assume that recordings are processed in any particular order. It's more likely now, but at one time work was split from tapes, based on whatever happened to be near the front of the shelf. Most of the time, that was new work, and old work was on tapes at the back of the shelf. SETI@Home said they would not reprocess work just to keep us "busy" -- there are other worthy projects if they have a long dry-spell. ____________ | |
| ID: 947321 · | |
|
Can you recommend another project you find worth of supporting also? I have no idea where to start. | |
| ID: 964446 · | |
Message boards : Technical News : Testing the Plumbing (Nov 12 2009)
| Copyright © 2013 University of California |