Sforzando (Sep 23 2010)


log in

Advanced search

Message boards : Technical News : Sforzando (Sep 23 2010)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1035273 - Posted: 23 Sep 2010, 18:10:36 UTC

Sorry about the extended two-day website brown-out just now. The mysql database server crashed during the "re-org," so that had to be restarted, then it crashed *again*. We didn't get a successful backup out of the thing until last night. That's a little bit annoying, and a little bit worrisome.

Let's see.. it's been a while since I put forth a litany of server issues. Except for the a/c debacle last week everything has been more or less status quo, but this week there was extra shuffling. Allow me to elaborate:

There have been some interesting unexpected consequences due to these extended weekly outages. For example, the amount of results hanging out in the mysql database has pretty much doubled (growing slowly but consistently over the past two months), which is causing minor indigestion: the database backups and re-orgs take much longer, and workunits and results are hanging out on disk much longer (and filling up their respective disks). But also some power users are trying to return hundreds, perhaps thousands, of results in a single scheduler request. This last thing was an issue because these requests were failing due to an apache request-limit-size bottleneck, and then the scheduler itself would barf on it. Well, the thing is, up until this week the scheduler had been running on anakin - one of the last few 32-bit machines in our closet. A new scheduler was built and tested to work on 64-bit systems. Long story short, this week we moved the scheduler onto bane, which was an under-utilized 64-bit machine just handling one half of the workunit downloads. And moved bane's downloads onto anakin. This was done via ip address swapping, so no worries about DNS rollout. We'll try this out either today, or when we open the floodgates tomorrow. By the way, we're looking into the "ghost" issue. That might explain the aforementioned "result indigestion" or at least part of it.

Also the boinc.berkeley.edu server has been suffering from OS rot, getting hit by several simultaneous web spiders, and just plain getting outdated and outgrown. It has served us well, but we finally bit the bullet and moved all that functionality to a newer, faster, better system and so far so good.

Fairly soon I'm going to blow away the current filesystems on bambi now that marvin is the trusted Astropulse database server. This should be quick, though I expect some snags (we had trouble before on this system having the BIOS recognize the 3ware RAID volumes as bootable drives). Once that's done we'll start moving all of bruno's functionality to bambi, and finally retire bruno (another flailing, troublesome 32-bit machine).

We're still trying to nail down the exact specs of the new science database server - Jeff has been doing some additional research regarding CPU upgrades - but that'll get purchased really really soon I swear.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3374
Credit: 2,077,086
RAC: 2,163
Canada
Message 1035275 - Posted: 23 Sep 2010, 18:15:30 UTC

Thanks for the news Matt. Sounds like good things on the way.
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3641
Credit: 48,594,311
RAC: 6,548
United States
Message 1035280 - Posted: 23 Sep 2010, 18:22:50 UTC

Looks like you will have to update the server status page again as well.

Thanks for the update Matt.
____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,525,792
RAC: 11,443
United States
Message 1035282 - Posted: 23 Sep 2010, 18:26:54 UTC - in response to Message 1035273.

May all the transitions be smooth and painless and may Oscar be the bestest server ever!! Hope we got you all you need to cover Oscar, I'm not sure Kittyman could live through another fund drive! :-)
____________


PROUD MEMBER OF Team Starfire World BOINC

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4089
Credit: 33,021,313
RAC: 6,777
United Kingdom
Message 1035295 - Posted: 23 Sep 2010, 19:06:16 UTC - in response to Message 1035273.
Last modified: 23 Sep 2010, 19:07:55 UTC

Thanks for the update Matt,

Can you, along with Eric, Jeff and David, work out how get an Astropulse_v505 switch implemented, along with changing the scheduler messages to not mention Astropulse_v5,
Eric posted he had'nt realised the Astropulse_v505 switch hadn't been implemented in this post last September, thanks.

Claggy

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7075
Credit: 60,301,031
RAC: 15,164
Germany
Message 1035309 - Posted: 23 Sep 2010, 19:27:35 UTC - in response to Message 1035273.

Matt, thanks for the news!

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

ToxicTBag
Send message
Joined: 5 Feb 10
Posts: 101
Credit: 57,197,902
RAC: 0
United Kingdom
Message 1035375 - Posted: 23 Sep 2010, 21:58:45 UTC

Thanks for the update Matt much appreciated!
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12510
Credit: 6,810,733
RAC: 5,575
United States
Message 1035382 - Posted: 23 Sep 2010, 22:07:10 UTC

Thanks for the update.

____________

Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 32,525,269
RAC: 11,835
United States
Message 1035442 - Posted: 24 Sep 2010, 2:56:28 UTC

Many thanks!
____________

Join the PACK!

Pascal Meeuws
Send message
Joined: 25 Nov 09
Posts: 5
Credit: 1,380,836
RAC: 0
Netherlands
Message 1035541 - Posted: 24 Sep 2010, 11:39:29 UTC

Thanks Matt
____________
It's 100% certain. There is no intelligent life in this universe.

Profile BakCompat
Avatar
Send message
Joined: 30 Jun 00
Posts: 7
Credit: 4,752,516
RAC: 548
United States
Message 1035680 - Posted: 24 Sep 2010, 17:35:03 UTC

That's great news! I know I'm looking forward to see the upgraded hardware smooth things out in the system. Good work there.

Profile bloodrain
Volunteer tester
Avatar
Send message
Joined: 8 Dec 08
Posts: 95
Credit: 7,905,463
RAC: 3,508
Antarctica
Message 1035773 - Posted: 24 Sep 2010, 21:05:43 UTC - in response to Message 1035680.

thanks for the update.
____________

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7075
Credit: 60,301,031
RAC: 15,164
Germany
Message 1035774 - Posted: 24 Sep 2010, 21:06:23 UTC

SETI@home crew, it's look like something is wrong with the scheduler..

Number crunching : 'Let the games begin 9-24'

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46301
Credit: 36,687,196
RAC: 5,189
Message 1035812 - Posted: 24 Sep 2010, 22:38:16 UTC

The Hard Disk Drive is full again, Can't upload or report.
____________
My Facebook, War Commander, 2015

Profile Daniel R. Pratt
Avatar
Send message
Joined: 10 Apr 00
Posts: 1
Credit: 2,338,938
RAC: 1,611
United States
Message 1035898 - Posted: 25 Sep 2010, 3:06:23 UTC

Matt:

Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation?

Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill.

Keep up the good work, though.

WATERHOLE

____________
Team Waterhole Administrator

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8644
Credit: 24,095,950
RAC: 21,586
United Kingdom
Message 1035913 - Posted: 25 Sep 2010, 4:07:47 UTC - in response to Message 1035898.

Matt:

Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation?

Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill.

Keep up the good work, though.

WATERHOLE

The "master file" is on the web server, so while these web pages can be accessed so can the "master file".

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6186
Credit: 697,633
RAC: 1,208
United States
Message 1035921 - Posted: 25 Sep 2010, 4:42:55 UTC - in response to Message 1035898.

Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets.

Uploading of completed Tasks and downloading of new unprocessed Tasks ARE done by separate servers. The problem is in the Scheduling Server and Scheduling Process, which receive Reports of completed Tasks and control the assignment of new work.

This past week they swapped server functions, and somehow something got bollixed so that the Scheduler is not responding to requests. Until that gets sorted out, we can upload completed Tasks, but cannot Report them, or get new Tasks.

You can read more about the server swaps in the 1st message of this thread, and more about the problems everyone is having, in threads in the Number Crunching section.

At the moment, I am at a CPU standstill.

Keep up the good work, though.

WATERHOLE

You are in good company. I run only CPU tasks on two old Mac G4s, and both are dry, waiting to report and get new Tasks. Many of us are dry or about to be. If you are not running another BOINC project as backup, this weekend might be a good time to shut down and do some seasonal cleaning of your rigs.

Around Seti@Home, patience is not just a virtue, it is a requirement.

____________
Donald
Infernal Optimist / Submariner, retired

Dave
Send message
Joined: 20 Aug 00
Posts: 27
Credit: 743,853
RAC: 306
United Kingdom
Message 1036200 - Posted: 25 Sep 2010, 17:43:00 UTC

As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition.
____________

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6186
Credit: 697,633
RAC: 1,208
United States
Message 1036648 - Posted: 28 Sep 2010, 18:59:40 UTC - in response to Message 1036200.
Last modified: 28 Sep 2010, 19:00:27 UTC

As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition.


No, all of the key people (Eric, Jeff, Matt, even Dr. Anderson) are employees of the UC Space Sciences Lab, and divide their time between S@H and other projects. They do spend a lot of their time keeping S@H running, including coming in on weekends and holidays to deal with casualties.

They ARE much appreciated by most of us, and that needs to be said more often.
____________
Donald
Infernal Optimist / Submariner, retired

1 · 2 · Next

Message boards : Technical News : Sforzando (Sep 23 2010)

Copyright © 2014 University of California