| Author |
Message |
|
|
snipped.......
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.
We electronic engineers have always used the drop or kick test as a first line tool for faulty equipment. The distinction of being a good, no so good or lousy engineer depends on your skill and experience at delivering the require accurate shock. |
|
|
|
|
Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..
many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.
We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.
These are also good reasons why "fault tolerance" is a good(although expensive) principle.
On the reports going back.. all of these were jotted down as "re-seat to clear."
Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.
One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.
And to think that some people think that I'm daft. LOL
Maybe in the future we can just swear at the particular part/item without getting physical but I doubt things will ever get that good. :D
____________
|
|
|
|
|
|
The deciding factor for a technician to be considered a genius or an idiot..
Close the data room door. If they see how you fixed things.. you are an idiot.
If they do not, you are magical!
____________
Janice |
|
|
|
|
|
Agrees with soft^spirit lol.
____________
|
|
|
|
|
Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..
many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.
We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.
These are also good reasons why "fault tolerance" is a good(although expensive) principle.
On the reports going back.. all of these were jotted down as "re-seat to clear."
Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.
One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.
And to think that some people think that I'm daft. LOL
Maybe in the future we can just swear at the particular part/item without getting physical but I doubt things will ever get that good. :D
It's not that we were angry at the terminal (this was back in the days of mainframes...) it is because the technique worked! The slap or drop reset contacts inside the terminal, (not gold-plated, for some reason...) which we (I and a collegue were the primary terminal fixers in the IT department - it was a secondary job for me, I was primarily a computer operator...) didn't have the knowledge to dis-assemble. (we'd have to call an outside contractor for that...)
____________
.
|
|
|
|
|
|
'server run, August 13-16 2010'
Because of the upper mentioned thread is closed, I write here.
Since ~ 24 hours the cricket graph show maxed out traffic.
So the mentioned 'WU limit in progress' isn't active.
My BOINC could DL the adjusted WU cache.
To now no server crash.
So why would be the limit needed?
The new donated server can manage the traffic better now?
____________
>Das Deutsche Cafe. The German Cafe.< |
|
|
|
|
snipped.......
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.
We electronic engineers have always used the drop or kick test as a first line tool for faulty equipment. The distinction of being a good, no so good or lousy engineer depends on your skill and experience at delivering the require accurate shock.
LOL! There ain't much that can't be fixed without the judicious application of a "Birmingham Screwdriver" :-)
____________
Damsel Rescuer, Kitty Patron, Raccoon Friend, Uli Fan,
Julie Supporter, ES99 Admirer, PETA Member, 1st Childhood
|
|
|
|
|
|
When I was repairing military radios we called that the two foot drop test.( They were a lot sturdier than civilian gear so we had to drop them farther.) That or they accidentally fell off the test bench! :-)
____________
PROUD MEMBER OF Team Starfire World BOINC |
|
|
|
|
The new donated server can manage the traffic better now?
Are you talking about the server that they still need to write the purchase order for?
____________
|
|
|
|
|
LOL! There ain't much that can't be fixed without the judicious application of a "Birmingham Screwdriver" :-)
There are a number of faults in my company's system that list "percussive maintenance" as the fix :-)
The Terror
P.S. "Planters", "Bedlam" or "Just Right" (A Breakfasr Cereal in Oz that contains the afore mentioned fruits, nuts and flakes) all sound good to me :-)
|
|
|
|
|
The new donated server can manage the traffic better now?
Are you talking about the server that they still need to write the purchase order for?
Ohh.. I thought the new donated server is already in the SETI@home lab.
So then, I'm much more curious why we didn't saw/don't have a server crash.
;-)
____________
>Das Deutsche Cafe. The German Cafe.< |
|
|
|
|
|
~ 28 1/2 hours maxed out traffic.
The cricket graph isn't longer maxed out.
So all PCs out there DLed their adjusted WU cache.
So I'm curious why we had/needed 'WU limit in progress'.
IIRC, it was for to protect server crashes.
But now, it worked without a limit.
The server are now more stable because of the last 3 day outage?
If, what the SETI@home crew did, what made the server more stable?
____________
>Das Deutsche Cafe. The German Cafe.< |
|
|
|
|
|
@Sutaru
So I'm curious why we had/needed 'WU limit in progress'.
IIRC, it was for to protect server crashes.
But now, it worked without a limit.
The server are now more stable because of the last 3 day outage?
If, what the SETI@home crew did, what made the server more stable?
In the first post of this thread, Matt said
That's all good, but the week in general has been tainted by mork issues in general. It had one of its regular mystery crashes on Tuesday (followed by a long recovery). Then last night, and again this morning, the RAID mirror of two solid state drives (where we keep the innodb logs) started going flakey on us. The partition would just disappear, sending mysql into fits. We were able to quickly recover, but we're abandoning the solid state drives for now. Honestly, they weren't adding all that much to the i/o picture because we were cautious about how we were implementing them. Now I'm glad we were cautious. The upshot of all the above meant that we had to recovery the replica as many as four times so far from the weekly backup. What a pain. The latest replica recovery is happening as I type this. All I hope is that all systems are normal and stable by tomorrow.
Maybe that's part of it. If Mork is more stable without the flaky solid-state drives, the whole system is more stable.
I'm also seeing fewer problems with goofy estimated completion times. That affects work fetch and cache filling. Maybe the server-side changes made a couple weeks ago are finally settling in. If so, maybe Jeff will feel comfortable raising the Friday-Saturday download limits next week.
____________
Donald
Infernal Optimist / Submariner, retired |
|
|
|
|
|
Goofy WU Limits fixed???
I got two AP units a few hours ago and they 'think' they will require ~348 hrs when they have been completing in ~60 hrs for some time now. Hmmm???
603 & 608 units seem to be timed about right. Guess it's just wait-and-see...
____________
|
|
|
|
|
Goofy WU Limits fixed???
I got two AP units a few hours ago and they 'think' they will require ~348 hrs when they have been completing in ~60 hrs for some time now. Hmmm???
603 & 608 units seem to be timed about right. Guess it's just wait-and-see...
Limits and estimates are totally unrelated. The limits which Jeff intended to set would have controlled how many "in progress" tasks of each type a host could have.
The difficulty with AP estimates stems from a delay in getting AP validators which interface to the new server-side estimate adjustments. That was fixed during the first week of August, but only AP tasks sent for validation since then can be used in the server average used for that adjustment. Then it takes ten such tasks before the servers will consider the average close enough to use. So you can expect at least the next 8 AP tasks to also be grossly overestimated. If that causes a problem in work fetch, a question in the Number Crunching forum would be the appropriate place for further discussion. Joe |
|
|
|
|
|
And also in the SETI BĂȘta N.C. Forums, same 'problem', occurs in the ATI GPU's, running OpenCL, CAL/BROOK or a mix of them.
But latest revisions like 434, are quite fast and do drop computing time from 122 to 3 - 6 hours.
But this is a bit 'of topic'.
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
Claggy Volunteer tester Send message
Joined: 5 Jul 99 Posts: 3365 Credit: 25,951,744 RAC: 1,177

|
|
Any chance we can have Seti Beta brought up, so we can upload/report our completed tasks before the Next outage?
Claggy
Edit: and in future bring it up a bit earlier?, please. |
|
|
|
|
|
Validate errors!
'Message 1025850'
Please disable the UL server/service. For to hold small the value of validate errors.
The SETI@home crew will let run the famous script then again for to grant the Cr.?
____________
>Das Deutsche Cafe. The German Cafe.< |
|
|