Hemiola (Aug 12 2010)


log in

Advanced search

Message boards : Technical News : Hemiola (Aug 12 2010)

Previous · 1 · 2
Author Message
WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8777
Credit: 25,913,156
RAC: 18,183
United Kingdom
Message 1024918 - Posted: 14 Aug 2010, 7:12:53 UTC - in response to Message 1024574.

snipped.......
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.

We electronic engineers have always used the drop or kick test as a first line tool for faulty equipment. The distinction of being a good, no so good or lousy engineer depends on your skill and experience at delivering the require accurate shock.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 8559
Credit: 99,184,604
RAC: 51,420
Australia
Message 1024930 - Posted: 14 Aug 2010, 8:07:34 UTC - in response to Message 1024574.

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.


One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...


AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.

And to think that some people think that I'm daft. LOL

Maybe in the future we can just swear at the particular part/item without getting physical but I doubt things will ever get that good. :D
____________

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,647,395
RAC: 516
United States
Message 1024969 - Posted: 14 Aug 2010, 12:12:06 UTC

The deciding factor for a technician to be considered a genius or an idiot..
Close the data room door. If they see how you fixed things.. you are an idiot.
If they do not, you are magical!


____________

Janice

ToxicTBag
Send message
Joined: 5 Feb 10
Posts: 101
Credit: 57,197,902
RAC: 0
United Kingdom
Message 1025000 - Posted: 14 Aug 2010, 13:47:48 UTC

Agrees with soft^spirit lol.
____________

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 2003
Credit: 11,187,569
RAC: 13,640
United States
Message 1025037 - Posted: 14 Aug 2010, 15:52:33 UTC - in response to Message 1024930.
Last modified: 14 Aug 2010, 15:53:46 UTC

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.


One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...


AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.

And to think that some people think that I'm daft. LOL

Maybe in the future we can just swear at the particular part/item without getting physical but I doubt things will ever get that good. :D


It's not that we were angry at the terminal (this was back in the days of mainframes...) it is because the technique worked! The slap or drop reset contacts inside the terminal, (not gold-plated, for some reason...) which we (I and a collegue were the primary terminal fixers in the IT department - it was a secondary job for me, I was primarily a computer operator...) didn't have the knowledge to dis-assemble. (we'd have to call an outside contractor for that...)
____________
.

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7122
Credit: 61,588,999
RAC: 16,384
Germany
Message 1025068 - Posted: 14 Aug 2010, 18:12:27 UTC

'server run, August 13-16 2010'

Because of the upper mentioned thread is closed, I write here.

Since ~ 24 hours the cricket graph show maxed out traffic.

So the mentioned 'WU limit in progress' isn't active.

My BOINC could DL the adjusted WU cache.

To now no server crash.

So why would be the limit needed?

The new donated server can manage the traffic better now?

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32626
Credit: 14,484,981
RAC: 13,295
United Kingdom
Message 1025099 - Posted: 14 Aug 2010, 20:06:31 UTC
Last modified: 14 Aug 2010, 20:21:25 UTC

snipped.......
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.


We electronic engineers have always used the drop or kick test as a first line tool for faulty equipment. The distinction of being a good, no so good or lousy engineer depends on your skill and experience at delivering the require accurate shock.


LOL! There ain't much that can't be fixed without the judicious application of a "Birmingham Screwdriver" :-)
____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, Shining Knight in Armour


Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,357,677
RAC: 8,780
United States
Message 1025105 - Posted: 14 Aug 2010, 20:23:44 UTC - in response to Message 1025099.

When I was repairing military radios we called that the two foot drop test.( They were a lot sturdier than civilian gear so we had to drop them farther.) That or they accidentally fell off the test bench! :-)
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13168
Credit: 7,901,806
RAC: 14,177
United States
Message 1025137 - Posted: 14 Aug 2010, 21:53:18 UTC - in response to Message 1025068.
Last modified: 14 Aug 2010, 21:53:32 UTC

The new donated server can manage the traffic better now?

Are you talking about the server that they still need to write the purchase order for?
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1759
Credit: 206,755,616
RAC: 19,057
Australia
Message 1025243 - Posted: 15 Aug 2010, 7:56:11 UTC - in response to Message 1025099.

LOL! There ain't much that can't be fixed without the judicious application of a "Birmingham Screwdriver" :-)


There are a number of faults in my company's system that list "percussive maintenance" as the fix :-)

The Terror

P.S. "Planters", "Bedlam" or "Just Right" (A Breakfasr Cereal in Oz that contains the afore mentioned fruits, nuts and flakes) all sound good to me :-)

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7122
Credit: 61,588,999
RAC: 16,384
Germany
Message 1025258 - Posted: 15 Aug 2010, 9:20:39 UTC - in response to Message 1025137.
Last modified: 15 Aug 2010, 9:23:27 UTC

The new donated server can manage the traffic better now?

Are you talking about the server that they still need to write the purchase order for?

Ohh.. I thought the new donated server is already in the SETI@home lab.

So then, I'm much more curious why we didn't saw/don't have a server crash.

;-)
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7122
Credit: 61,588,999
RAC: 16,384
Germany
Message 1025259 - Posted: 15 Aug 2010, 9:28:00 UTC - in response to Message 1025258.
Last modified: 15 Aug 2010, 9:29:50 UTC

~ 28 1/2 hours maxed out traffic.

The cricket graph isn't longer maxed out.
So all PCs out there DLed their adjusted WU cache.

So I'm curious why we had/needed 'WU limit in progress'.
IIRC, it was for to protect server crashes.
But now, it worked without a limit.

The server are now more stable because of the last 3 day outage?
If, what the SETI@home crew did, what made the server more stable?
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6358
Credit: 793,804
RAC: 1,467
United States
Message 1025310 - Posted: 15 Aug 2010, 15:15:11 UTC - in response to Message 1024344.

@Sutaru

So I'm curious why we had/needed 'WU limit in progress'.
IIRC, it was for to protect server crashes.
But now, it worked without a limit.

The server are now more stable because of the last 3 day outage?
If, what the SETI@home crew did, what made the server more stable?


In the first post of this thread, Matt said

That's all good, but the week in general has been tainted by mork issues in general. It had one of its regular mystery crashes on Tuesday (followed by a long recovery). Then last night, and again this morning, the RAID mirror of two solid state drives (where we keep the innodb logs) started going flakey on us. The partition would just disappear, sending mysql into fits. We were able to quickly recover, but we're abandoning the solid state drives for now. Honestly, they weren't adding all that much to the i/o picture because we were cautious about how we were implementing them. Now I'm glad we were cautious. The upshot of all the above meant that we had to recovery the replica as many as four times so far from the weekly backup. What a pain. The latest replica recovery is happening as I type this. All I hope is that all systems are normal and stable by tomorrow.


Maybe that's part of it. If Mork is more stable without the flaky solid-state drives, the whole system is more stable.

I'm also seeing fewer problems with goofy estimated completion times. That affects work fetch and cache filling. Maybe the server-side changes made a couple weeks ago are finally settling in. If so, maybe Jeff will feel comfortable raising the Friday-Saturday download limits next week.



____________
Donald
Infernal Optimist / Submariner, retired

Profile Dave BarstowProject donor
Send message
Joined: 14 May 99
Posts: 76
Credit: 15,064,044
RAC: 0
Philippines
Message 1025339 - Posted: 15 Aug 2010, 17:02:18 UTC - in response to Message 1025310.

Goofy WU Limits fixed???

I got two AP units a few hours ago and they 'think' they will require ~348 hrs when they have been completing in ~60 hrs for some time now. Hmmm???

603 & 608 units seem to be timed about right. Guess it's just wait-and-see...
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4346
Credit: 1,123,600
RAC: 733
United States
Message 1025361 - Posted: 15 Aug 2010, 19:03:57 UTC - in response to Message 1025339.

Goofy WU Limits fixed???

I got two AP units a few hours ago and they 'think' they will require ~348 hrs when they have been completing in ~60 hrs for some time now. Hmmm???

603 & 608 units seem to be timed about right. Guess it's just wait-and-see...

Limits and estimates are totally unrelated. The limits which Jeff intended to set would have controlled how many "in progress" tasks of each type a host could have.

The difficulty with AP estimates stems from a delay in getting AP validators which interface to the new server-side estimate adjustments. That was fixed during the first week of August, but only AP tasks sent for validation since then can be used in the server average used for that adjustment. Then it takes ten such tasks before the servers will consider the average close enough to use. So you can expect at least the next 8 AP tasks to also be grossly overestimated. If that causes a problem in work fetch, a question in the Number Crunching forum would be the appropriate place for further discussion.
Joe

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,520
RAC: 119
Netherlands
Message 1025378 - Posted: 15 Aug 2010, 20:35:46 UTC - in response to Message 1025361.

And also in the SETI BĂȘta N.C. Forums, same 'problem', occurs in the ATI GPU's, running OpenCL, CAL/BROOK or a mix of them.
But latest revisions like 434, are quite fast and do drop computing time from 122 to 3 - 6 hours.
But this is a bit 'of topic'.

____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4239
Credit: 34,929,607
RAC: 23,543
United Kingdom
Message 1025664 - Posted: 16 Aug 2010, 19:59:02 UTC
Last modified: 16 Aug 2010, 20:48:54 UTC

Any chance we can have Seti Beta brought up, so we can upload/report our completed tasks before the Next outage?

Claggy

Edit: and in future bring it up a bit earlier?, please.

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7122
Credit: 61,588,999
RAC: 16,384
Germany
Message 1025856 - Posted: 17 Aug 2010, 13:52:31 UTC
Last modified: 17 Aug 2010, 13:58:58 UTC

Validate errors!

'Message 1025850'

Please disable the UL server/service. For to hold small the value of validate errors.

The SETI@home crew will let run the famous script then again for to grant the Cr.?
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Previous · 1 · 2

Message boards : Technical News : Hemiola (Aug 12 2010)

Copyright © 2014 University of California