Hemiola (Aug 12 2010)


log in

Advanced search

Message boards : Technical News : Hemiola (Aug 12 2010)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1388
Credit: 74,079
RAC: 0
United States
Message 1024344 - Posted: 12 Aug 2010, 20:58:48 UTC

Wrapping up the weekly "extended outage." Jeff's actually out today, but will be back to turn the servers on tomorrow (i.e. Friday, when I'm usually out).

I finally got around to testing a drive on mork (the mysql server) that the RAID card deemed "failed" at some point, but maybe that was a transient problem as it seems fine now. Nevertheless I went through the rigamarole of pulling that drive, putting a new on in, testing it, making it a new hot spare, etc.

That's all good, but the week in general has been tainted by mork issues in general. It had one of its regular mystery crashes on Tuesday (followed by a long recovery). Then last night, and again this morning, the RAID mirror of two solid state drives (where we keep the innodb logs) started going flakey on us. The partition would just disappear, sending mysql into fits. We were able to quickly recover, but we're abandoning the solid state drives for now. Honestly, they weren't adding all that much to the i/o picture because we were cautious about how we were implementing them. Now I'm glad we were cautious. The upshot of all the above meant that we had to recovery the replica as many as four times so far from the weekly backup. What a pain. The latest replica recovery is happening as I type this. All I hope is that all systems are normal and stable by tomorrow.

Everything else is fine. In fact, more than fine as a set of very generous participants donated $6000 towards a new server that will become the new science database server. THANK YOU!! We're still spec'ing out said server, but will go ahead sooner than later now that we don't have to set up a funding drive!

Meanwhile I'm still chipping away at various data analysis projects, Jeff's been fighting with data syncronization issues that have been creeping in more and more lately. We also had a "design meeting" regarding where to go with the public involvement of candidate selection. I'm finding some plug-n-play visualization utilities on line, but pretty much I'm finding (like always) it might just be easier and better if I do it all myself with tools I already know. However, some improvements go beyond that scope, so I'm digging into AJAX which is good stuff to know, I guess.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12257
Credit: 2,544,727
RAC: 310
Netherlands
Message 1024346 - Posted: 12 Aug 2010, 21:07:11 UTC - in response to Message 1024344.

Shouldn't you name that new server after the benefactors? Or is MRJHJT too difficult to pronounce in the office? ;-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Scarecrow
Avatar
Send message
Joined: 15 Jul 00
Posts: 4382
Credit: 458,880
RAC: 4
United States
Message 1024348 - Posted: 12 Aug 2010, 21:09:07 UTC - in response to Message 1024346.

MRJHJT

Oddly enough, that's the noise a solid state drive makes when it augers in.

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 532,447
RAC: 37
United States
Message 1024351 - Posted: 12 Aug 2010, 21:15:53 UTC - in response to Message 1024344.

With $6k, would be nice to squeeze a real RAID card out for the new server.

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7016
Credit: 59,109,424
RAC: 20,492
Germany
Message 1024352 - Posted: 12 Aug 2010, 21:19:19 UTC - in response to Message 1024344.

Matt, thanks for the news!

____________
BR



>Das Deutsche Cafe. The German Cafe.<

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4039
Credit: 32,691,076
RAC: 800
United Kingdom
Message 1024353 - Posted: 12 Aug 2010, 21:20:17 UTC - in response to Message 1024344.

Thanks for the update Matt,

Claggy

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3330
Credit: 1,955,293
RAC: 2,480
Canada
Message 1024354 - Posted: 12 Aug 2010, 21:20:44 UTC

So, if Hocket referred to the way you guys share work at Berkely, does Hemiola refer to the days between server failures lately? (1 2 3, 1 2 3, 1 2, 1 2, 1 2)

Oh, and thanks for the update.
____________

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12023
Credit: 6,360,358
RAC: 8,611
United States
Message 1024362 - Posted: 12 Aug 2010, 21:38:06 UTC

Thanks for the update.

____________

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,628,252
RAC: 1,615
United States
Message 1024369 - Posted: 12 Aug 2010, 22:15:21 UTC

As one of the donations (yet to be delivered.. plans are in progress)..
I would vote for a name like "Planters" Cause its from a bunch of nuts.
____________

Janice

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,628,252
RAC: 1,615
United States
Message 1024377 - Posted: 12 Aug 2010, 22:45:47 UTC - in response to Message 1024344.

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.
____________

Janice

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45729
Credit: 36,372,766
RAC: 8,574
Message 1024378 - Posted: 12 Aug 2010, 22:47:29 UTC - in response to Message 1024369.

As one of the donations (yet to be delivered.. plans are in progress)..
I would vote for a name like "Planters" Cause its from a bunch of nuts.

Planters sound good to Me too.

@ Matt: Thanks for the update on Morks Odyssey.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24049
Credit: 516,641
RAC: 154
United States
Message 1024427 - Posted: 13 Aug 2010, 1:17:46 UTC - in response to Message 1024378.

As one of the donations (yet to be delivered.. plans are in progress)..
I would vote for a name like "Planters" Cause its from a bunch of nuts.

Planters sound good to Me too.

@ Matt: Thanks for the update on Morks Odyssey.

How about Bedlam? As in a house full of fruits, nuts and flakes.
____________


BOINC WIKI

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45729
Credit: 36,372,766
RAC: 8,574
Message 1024449 - Posted: 13 Aug 2010, 3:03:51 UTC - in response to Message 1024427.

As one of the donations (yet to be delivered.. plans are in progress)..
I would vote for a name like "Planters" Cause its from a bunch of nuts.

Planters sound good to Me too.

@ Matt: Thanks for the update on Morks Odyssey.

How about Bedlam? As in a house full of fruits, nuts and flakes.

I'm sure someone will find a name somewhere.
____________

Profile Jack Zhang
Volunteer tester
Avatar
Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,026,699
RAC: 1,114
Canada
Message 1024492 - Posted: 13 Aug 2010, 8:58:57 UTC

I hear SSD talk in this news post...

Avoid Kingston and consumer OCZ products when it comes to SSDs. Intel is only good if it's SLC memory and if there was an SSD move made, that SSD must have a supercapacitor to handle Server IOs per second. Pretty much the only choice when it comes to Server SSDs is the Sandforce SF-1500 controller chips with supercapacitor.
____________
What if Fiction was Fact and Fact was Fiction and vice versa?

Profile Helli
Volunteer tester
Avatar
Send message
Joined: 15 Dec 99
Posts: 697
Credit: 83,708,811
RAC: 67,235
Germany
Message 1024514 - Posted: 13 Aug 2010, 12:34:53 UTC - in response to Message 1024346.

Shouldn't you name that new server after the benefactors? Or is MRJHJT too difficult to pronounce in the office? ;-)


Well i don't believe that we would find a word that's representing the six Sponsors.

But - i would love to see a Sticker on the Server with written on it like "Mainly sponsored by Mark, Richard, Josef, Helli, John and T.A." ;-)
A Picture in the SETI@home Photo Album would also be fine so we can say years later: "Hey, look, a 1/6 of this Rig was sponsored by me". :-)

Only my 2c. :-)

Helli
____________

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4195
Credit: 1,028,567
RAC: 285
United States
Message 1024557 - Posted: 13 Aug 2010, 14:42:37 UTC - in response to Message 1024369.

soft^spirit wrote:
As one of the donations (yet to be delivered.. plans are in progress)..
I would vote for a name like "Planters" Cause its from a bunch of nuts.

I never doubted your pledge for August 28, and believe the project should be considering $7000 as donated to the cause.

Stretching the allusion to peanuts a bit further, perhaps Carver would be an intersting name possibility.
Joe

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,628,252
RAC: 1,615
United States
Message 1024563 - Posted: 13 Aug 2010, 15:02:37 UTC - in response to Message 1024557.

honestly until a couple of posts ago, it never occured to me that it was not considered part of the 6K. There was an after the goal reached announcement donation of 1K..ahh well.

In any case.. add it however they want. "Hardware" is the only stipulation to it.
____________

Janice

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1892
Credit: 9,070,816
RAC: 9,078
United States
Message 1024565 - Posted: 13 Aug 2010, 15:04:24 UTC - in response to Message 1024377.

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.


One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...

____________
.

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3330
Credit: 1,955,293
RAC: 2,480
Canada
Message 1024574 - Posted: 13 Aug 2010, 15:24:57 UTC - in response to Message 1024565.

Matt.. if I might share.. from my experiences of keeping together antiques that were often poorly "refurbished"..

many problems clear permanently upon "re-seating" unplugging, and plugging back in. Other times taking things out, some surprise drops loose(seen or unseen 50/50).. and are then magically "fixed". Whether they were dirty connections, a bit of dust, someones raisinette.. does not really matter as long as they clear. a bad connection invisible to the eye might nearly need "bumped".. and could be gone forever.

We came up with things such as "pencil test".. where while monitoring the signal we tapped the outside case and see if it had effects. And some of the equipment was old enough to even contain mercury relays, where the mercury would vaporize, re-solidify in obscure pieces, and refuse to work until we "bounced" (hold edge of component 3-4" above anti-static surface, drop and catch on first bounce, re-insert) to clear.

These are also good reasons why "fault tolerance" is a good(although expensive) principle.

On the reports going back.. all of these were jotted down as "re-seat to clear."

Because if we told the truth, the whole truth, and nothing but the truth... it would have been the Salem Witch trials all over again.


One thing that used to work on CRT terminals, back in the '80s, was to give them a "slap upside the screen". Some terminals would come back to life for a time after the slap. Location (and force) was brand-dependent, and with one of the brands, there were two methods that worked, depending on symptom: the slap, directed at the upper right of the CRT, and lifting the front of the CRT about an inch, and dropping. IBM 3278's were pretty reliable, but when they went, they could (sometimes...) be brought back by slapping the back right corner, and picking up the back about .5 inch, and dropping...


AS a (mostly) mechanical engineer, it does my heart good to see my electronic colleagues adapting the time honoured and tested ways of the mech eng.
____________

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 643
Credit: 5,323,756
RAC: 6,163
New Zealand
Message 1024911 - Posted: 14 Aug 2010, 5:33:56 UTC
Last modified: 14 Aug 2010, 5:34:46 UTC

What great news re the $6k donation. From Staycation (Jul 01 2010)

Data wise, we were able to get back to merging our various spike tables together full bore
How far through merging the spike tables are you now?

BOINC replica database saying running on the left hand side of the Server Status page yet beside Replica seconds behind master it says Offline. Is it still recovering after it's various crashes throughout the week?

Thanks so much for the update
____________

Live in NZ y not join Smile City?

1 · 2 · Next

Message boards : Technical News : Hemiola (Aug 12 2010)

Copyright © 2014 University of California