Hemiola (Aug 12 2010)

Message boards : Technical News : Hemiola (Aug 12 2010)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 18996
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1024918 - Posted: 14 Aug 2010, 7:12:53 UTC - in response to Message 1024574.  

snipped.......
AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.

We electronic engineers have always used the drop or kick test as a first line tool for faulty equipment. The distinction of being a good, no so good or lousy engineer depends on your skill and experience at delivering the require accurate shock.
ID: 1024918 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1024930 - Posted: 14 Aug 2010, 8:07:34 UTC - in response to Message 1024574.  

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.


One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...


AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.

And to think that some people think that I'm daft. LOL

Maybe in the future we can just swear at the particular part/item without getting physical but I doubt things will ever get that good. :D
ID: 1024930 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1024969 - Posted: 14 Aug 2010, 12:12:06 UTC

The deciding factor for a technician to be considered a genius or an idiot..
Close the data room door. If they see how you fixed things.. you are an idiot.
If they do not, you are magical!


Janice
ID: 1024969 · Report as offensive
ToxicTBag

Send message
Joined: 5 Feb 10
Posts: 101
Credit: 57,197,902
RAC: 0
United Kingdom
Message 1025000 - Posted: 14 Aug 2010, 13:47:48 UTC

Agrees with soft^spirit lol.
ID: 1025000 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1025037 - Posted: 14 Aug 2010, 15:52:33 UTC - in response to Message 1024930.  
Last modified: 14 Aug 2010, 15:53:46 UTC

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.


One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...


AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.

And to think that some people think that I'm daft. LOL

Maybe in the future we can just swear at the particular part/item without getting physical but I doubt things will ever get that good. :D


It's not that we were angry at the terminal (this was back in the days of mainframes...) it is because the technique worked! The slap or drop reset contacts inside the terminal, (not gold-plated, for some reason...) which we (I and a collegue were the primary terminal fixers in the IT department - it was a secondary job for me, I was primarily a computer operator...) didn't have the knowledge to dis-assemble. (we'd have to call an outside contractor for that...)
.

Hello, from Albany, CA!...
ID: 1025037 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1025068 - Posted: 14 Aug 2010, 18:12:27 UTC

'server run, August 13-16 2010'

Because of the upper mentioned thread is closed, I write here.

Since ~ 24 hours the cricket graph show maxed out traffic.

So the mentioned 'WU limit in progress' isn't active.

My BOINC could DL the adjusted WU cache.

To now no server crash.

So why would be the limit needed?

The new donated server can manage the traffic better now?

ID: 1025068 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1025105 - Posted: 14 Aug 2010, 20:23:44 UTC - in response to Message 1025099.  

When I was repairing military radios we called that the two foot drop test.( They were a lot sturdier than civilian gear so we had to drop them farther.) That or they accidentally fell off the test bench! :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 1025105 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30593
Credit: 53,134,872
RAC: 32
United States
Message 1025137 - Posted: 14 Aug 2010, 21:53:18 UTC - in response to Message 1025068.  
Last modified: 14 Aug 2010, 21:53:32 UTC

The new donated server can manage the traffic better now?

Are you talking about the server that they still need to write the purchase order for?
ID: 1025137 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1025243 - Posted: 15 Aug 2010, 7:56:11 UTC - in response to Message 1025099.  

LOL! There ain't much that can't be fixed without the judicious application of a "Birmingham Screwdriver" :-)


There are a number of faults in my company's system that list "percussive maintenance" as the fix :-)

The Terror

P.S. "Planters", "Bedlam" or "Just Right" (A Breakfasr Cereal in Oz that contains the afore mentioned fruits, nuts and flakes) all sound good to me :-)
ID: 1025243 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1025258 - Posted: 15 Aug 2010, 9:20:39 UTC - in response to Message 1025137.  
Last modified: 15 Aug 2010, 9:23:27 UTC

The new donated server can manage the traffic better now?

Are you talking about the server that they still need to write the purchase order for?

Ohh.. I thought the new donated server is already in the SETI@home lab.

So then, I'm much more curious why we didn't saw/don't have a server crash.

;-)
ID: 1025258 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1025259 - Posted: 15 Aug 2010, 9:28:00 UTC - in response to Message 1025258.  
Last modified: 15 Aug 2010, 9:29:50 UTC

~ 28 1/2 hours maxed out traffic.

The cricket graph isn't longer maxed out.
So all PCs out there DLed their adjusted WU cache.

So I'm curious why we had/needed 'WU limit in progress'.
IIRC, it was for to protect server crashes.
But now, it worked without a limit.

The server are now more stable because of the last 3 day outage?
If, what the SETI@home crew did, what made the server more stable?
ID: 1025259 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1025310 - Posted: 15 Aug 2010, 15:15:11 UTC - in response to Message 1024344.  

@Sutaru
So I'm curious why we had/needed 'WU limit in progress'.
IIRC, it was for to protect server crashes.
But now, it worked without a limit.

The server are now more stable because of the last 3 day outage?
If, what the SETI@home crew did, what made the server more stable?


In the first post of this thread, Matt said

That's all good, but the week in general has been tainted by mork issues in general. It had one of its regular mystery crashes on Tuesday (followed by a long recovery). Then last night, and again this morning, the RAID mirror of two solid state drives (where we keep the innodb logs) started going flakey on us. The partition would just disappear, sending mysql into fits. We were able to quickly recover, but we're abandoning the solid state drives for now. Honestly, they weren't adding all that much to the i/o picture because we were cautious about how we were implementing them. Now I'm glad we were cautious. The upshot of all the above meant that we had to recovery the replica as many as four times so far from the weekly backup. What a pain. The latest replica recovery is happening as I type this. All I hope is that all systems are normal and stable by tomorrow.


Maybe that's part of it. If Mork is more stable without the flaky solid-state drives, the whole system is more stable.

I'm also seeing fewer problems with goofy estimated completion times. That affects work fetch and cache filling. Maybe the server-side changes made a couple weeks ago are finally settling in. If so, maybe Jeff will feel comfortable raising the Friday-Saturday download limits next week.



Donald
Infernal Optimist / Submariner, retired
ID: 1025310 · Report as offensive
Profile Dave Barstow

Send message
Joined: 14 May 99
Posts: 76
Credit: 15,064,044
RAC: 0
Philippines
Message 1025339 - Posted: 15 Aug 2010, 17:02:18 UTC - in response to Message 1025310.  

Goofy WU Limits fixed???

I got two AP units a few hours ago and they 'think' they will require ~348 hrs when they have been completing in ~60 hrs for some time now. Hmmm???

603 & 608 units seem to be timed about right. Guess it's just wait-and-see...
ID: 1025339 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1025361 - Posted: 15 Aug 2010, 19:03:57 UTC - in response to Message 1025339.  

Goofy WU Limits fixed???

I got two AP units a few hours ago and they 'think' they will require ~348 hrs when they have been completing in ~60 hrs for some time now. Hmmm???

603 & 608 units seem to be timed about right. Guess it's just wait-and-see...

Limits and estimates are totally unrelated. The limits which Jeff intended to set would have controlled how many "in progress" tasks of each type a host could have.

The difficulty with AP estimates stems from a delay in getting AP validators which interface to the new server-side estimate adjustments. That was fixed during the first week of August, but only AP tasks sent for validation since then can be used in the server average used for that adjustment. Then it takes ten such tasks before the servers will consider the average close enough to use. So you can expect at least the next 8 AP tasks to also be grossly overestimated. If that causes a problem in work fetch, a question in the Number Crunching forum would be the appropriate place for further discussion.
                                                                Joe
ID: 1025361 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1025378 - Posted: 15 Aug 2010, 20:35:46 UTC - in response to Message 1025361.  

And also in the SETI Bêta N.C. Forums, same 'problem', occurs in the ATI GPU's, running OpenCL, CAL/BROOK or a mix of them.
But latest revisions like 434, are quite fast and do drop computing time from 122 to 3 - 6 hours.
But this is a bit 'of topic'.

ID: 1025378 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1025664 - Posted: 16 Aug 2010, 19:59:02 UTC
Last modified: 16 Aug 2010, 20:48:54 UTC

Any chance we can have Seti Beta brought up, so we can upload/report our completed tasks before the Next outage?

Claggy

Edit: and in future bring it up a bit earlier?, please.
ID: 1025664 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1025856 - Posted: 17 Aug 2010, 13:52:31 UTC
Last modified: 17 Aug 2010, 13:58:58 UTC

Validate errors!

'Message 1025850'

Please disable the UL server/service. For to hold small the value of validate errors.

The SETI@home crew will let run the famous script then again for to grant the Cr.?
ID: 1025856 · Report as offensive
Previous · 1 · 2

Message boards : Technical News : Hemiola (Aug 12 2010)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.