Hocket (Aug 05 2010)


log in

Advanced search

Message boards : Technical News : Hocket (Aug 05 2010)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1022821 - Posted: 5 Aug 2010, 21:28:30 UTC

Another catchup post. I'm still trying to page in everything I missed in July - it doesn't help that shortly after the last post I got a nasty summer cold. I'm back in business now.

We had another mysql database server crash over the weekend, which Jeff handled remotely without much ado. The upload server also had its directly attached storage array freak out again. This is becoming a common event, resulting in the software RAID getting in some funky state (which has always been reversible thus far).

Other than that, the servers are still chugging along. As for the grand server shuffle, progress has been made and a definite plan is in motion. Basically marvin is becoming bambi (the Astropulse database) and bambi is becoming bruno (the upload/BOINC admin server) and bruno is being turned off. Meanwhile some new machine (we'll acquire somehow) will become thumper (the science database) and thumper will become ptolemy (internal file server) and ptolemy will shut off. Getting bruno and ptolemy out of the picture means two of the three servers prone to random crashes/hardware issues will no longer be on line. The third such server is mork, which is the only server remotely close to handling the mysql database load, so no options for fixing that anytime soon. We have our hands full anyway fixing what we got.

I also (finally) got a test suite working for all my birdie tests (i.e. putting a fake signal or "birdie" in the raw data, blanking it, splitting it, then running clients on it to see if the birdie still appears). This took me a while as I had to remember all the various bits and pieces of this puzzle, some of which I haven't touched for months. Now that it's all in one big script, which is nice. Oh yeah I also parallelized the software blanking pre-processing, so new data can get on line twice as fast as before (if resources are available).

Jeff's going to put some newly compiled Astropulse back end services on line tomorrow. Hopefully that's all good or else we'll likely run out of work over the weekend (which happend last weekend, but was mostly hidden by the mysql database server crash).

It's summertime, so people are in and out of the lab a lot, but enough of us will be in one room at the same time next week that more meaningful plans/management discussions will take place regarding NTPCkr and other scienctific analysis stuff.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32104
Credit: 13,791,812
RAC: 25,118
United Kingdom
Message 1022831 - Posted: 5 Aug 2010, 22:28:26 UTC

Great stuff Matt, thanks for the update !

____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, didn't take pot advice!


ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4141
Credit: 33,619,126
RAC: 27,739
United Kingdom
Message 1022834 - Posted: 5 Aug 2010, 22:32:49 UTC - in response to Message 1022821.

Keep up the good work Matt, Jeff and Others, and thanks for the update,

Claggy

B-Man
Volunteer tester
Send message
Joined: 11 Feb 01
Posts: 253
Credit: 147,366
RAC: 0
United States
Message 1022885 - Posted: 6 Aug 2010, 2:16:49 UTC

Thank you for the update. Keep up the great work. seems to be going if not smoothly.
____________

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,933,483
RAC: 17,242
Germany
Message 1022892 - Posted: 6 Aug 2010, 2:44:10 UTC
Last modified: 6 Aug 2010, 2:47:55 UTC

Thanks for the update!

What's because of the validate errors which have begun ~ 2 hours before the current outage? You will let run the famous script for to grant the Credits?
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12327
Credit: 2,632,842
RAC: 1,201
Netherlands
Message 1022928 - Posted: 6 Aug 2010, 5:48:48 UTC - in response to Message 1022821.
Last modified: 6 Aug 2010, 5:50:04 UTC

it doesn't help that shortly after the last post I got a nasty summer cold.

Ah, so that's where I caught it from. Thanks for that. 2 weeks in and still battling it, but at least I got my voice back. :-)

Thanks for the update, yet for us non-native-English-tonguers, what's Hocket? (I know Steve Hackett)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile LiliKrist
Volunteer tester
Avatar
Send message
Joined: 12 Aug 09
Posts: 333
Credit: 143,167
RAC: 0
Indonesia
Message 1022990 - Posted: 6 Aug 2010, 9:13:48 UTC

Thanks for the update Master Matt =)
____________


N = R x fp x ne x fl x fi x fc x L

ront
Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 1022994 - Posted: 6 Aug 2010, 9:31:28 UTC

Hi,

Thanks for the information Matt.

Hope your cold is getting better.

Be Blessed & Be A Blessing,


ront
____________

ToxicTBag
Send message
Joined: 5 Feb 10
Posts: 101
Credit: 57,197,902
RAC: 0
United Kingdom
Message 1023002 - Posted: 6 Aug 2010, 10:01:32 UTC

Updates are much appreciated Matt, curse those summer colds!!
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12743
Credit: 7,283,703
RAC: 17,953
United States
Message 1023086 - Posted: 6 Aug 2010, 15:17:15 UTC - in response to Message 1022821.

It's summertime, so people are in and out of the lab a lot, but enough of us will be in one room at the same time next week that more meaningful plans/management discussions will take place regarding NTPCkr and other scienctific analysis stuff.

RAC chasers, be afraid, be very afraid. Last time that happened we got three day breaks! :)

Thanks for the update Matt. Much appreciated.

____________

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,895,239
RAC: 3,937
Netherlands
Message 1023090 - Posted: 6 Aug 2010, 15:48:05 UTC - in response to Message 1023086.
Last modified: 6 Aug 2010, 15:48:39 UTC


it doesn't help that shortly after the last post I got a nasty summer cold.



Ah, so that's where I caught it from. Thanks for that. 2 weeks in and still battling it, but at least I got my voice back. :-)


You're not the only one, still got no 'voice' and fever too. :(
Must be a mutated human/computer virus, LOL ;-)

Anyway glad you 'survived' all and glad to hear from you.
And ofcoarse thanks for your UPdate, on the project.
____________

Profile S@NL - Vipertje - D. van Es
Avatar
Send message
Joined: 19 Oct 02
Posts: 34
Credit: 24,464,916
RAC: 12,902
Netherlands
Message 1023211 - Posted: 6 Aug 2010, 21:09:37 UTC

Thnx or the update Matt,

only one thing I don't understand. There is a lot spoken about not enough resources and yet you discharge two servers (again). Why don't you use it for services what the can do. Even it is only one or two services. I really don't understand it. Why not using older servers for one or two services like a one mb and ap splitter on the servers. When you but so much stuff and services on a server you are depending that the server must work and if you split it one much more server you have more a failsave if one goes down that the hole project don't go offline!!!

And reading your tech post for 2 years now I know that you have a lot of old server in your basemant down at Berkeley!!!

Don't understand me wrong, but I find it mindbodering when I read everytime when there is something wrong.

So now the question: Why don't you split the services more on the old servers???
____________
I do what I can and I can what I do! :P

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,599,549
RAC: 47,496
Australia
Message 1023217 - Posted: 6 Aug 2010, 21:22:02 UTC - in response to Message 1023211.
Last modified: 6 Aug 2010, 21:23:05 UTC

So now the question: Why don't you split the services more on the old servers???

Because they are unreliable & keep failing.
And then people get all upset when they can't upload or download or report or all three untill the servers have restarted & the databases have been checked & repaired if necessary.
____________
Grant
Darwin NT.

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3403
Credit: 2,150,760
RAC: 2,144
Canada
Message 1023230 - Posted: 6 Aug 2010, 22:28:16 UTC - in response to Message 1023217.

So now the question: Why don't you split the services more on the old servers???

Because they are unreliable & keep failing.
And then people get all upset when they can't upload or download or report or all three untill the servers have restarted & the databases have been checked & repaired if necessary.


And when the do fail, the scientists at S@H become the most educated IT department in the world, and spend too much time on all the things mentioned above, when they should be looking for ET.
____________

Profile S@NL - Vipertje - D. van Es
Avatar
Send message
Joined: 19 Oct 02
Posts: 34
Credit: 24,464,916
RAC: 12,902
Netherlands
Message 1023410 - Posted: 7 Aug 2010, 10:23:26 UTC - in response to Message 1023230.

But why is it unrealible? Are the servers the problem, or the people who install them, or the services they want to run on the servers???

I have almost never seen a unrealible server were the problem was the hardware, most of the cases were the problem blamed to the software or OS...
____________
I do what I can and I can what I do! :P

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5868
Credit: 60,599,549
RAC: 47,496
Australia
Message 1023418 - Posted: 7 Aug 2010, 11:05:13 UTC - in response to Message 1023410.

But why is it unrealible? Are the servers the problem,

The servers are the main problem. Several are pre-production units.
Once they've got servers that can be depended on, then they can spend more time working on the software.
____________
Grant
Darwin NT.

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32104
Credit: 13,791,812
RAC: 25,118
United Kingdom
Message 1023436 - Posted: 7 Aug 2010, 13:18:16 UTC

And when the do fail, the scientists at S@H become the most educated IT department in the world, and spend too much time on all the things mentioned above, when they should be looking for ET.


One of the reaons for the new 3 day downtime. I did wonder at one time whether they were pushing the Informix database to its limits but it appears not, there are larger ones out there.

____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, didn't take pot advice!


zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46519
Credit: 36,862,448
RAC: 5,008
United States
Message 1023530 - Posted: 7 Aug 2010, 20:02:07 UTC - in response to Message 1023418.

But why is it unrealible? Are the servers the problem,

The servers are the main problem. Several are pre-production units.
Once they've got servers that can be depended on, then they can spend more time working on the software.

Donated Pre Production Servers at that, Hopefully there is enough for 1 or 2 good production blade servers of the type Seti needs to get, Last I heard $7,000 was raised thanks to 1 loud mouth and 6 others, Maybe Seti's equivalent of "the Magnificent Seven"... I have one old Pre Production cpu running My current setup, Which is awaiting It's retirement from crunching, But the next cpu is having to wait until supporting parts are acquired and outfitted before their gone and so I wait, patiently as I have lots to get done before the computer purchases can begin so that I can be done with this old hardware.
____________
My Facebook, War Commander, 2015

1 · 2 · Next

Message boards : Technical News : Hocket (Aug 05 2010)

Copyright © 2014 University of California