Sforzando (Sep 23 2010)

Message boards : Technical News : Sforzando (Sep 23 2010)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1035273 - Posted: 23 Sep 2010, 18:10:36 UTC

Sorry about the extended two-day website brown-out just now. The mysql database server crashed during the "re-org," so that had to be restarted, then it crashed *again*. We didn't get a successful backup out of the thing until last night. That's a little bit annoying, and a little bit worrisome.

Let's see.. it's been a while since I put forth a litany of server issues. Except for the a/c debacle last week everything has been more or less status quo, but this week there was extra shuffling. Allow me to elaborate:

There have been some interesting unexpected consequences due to these extended weekly outages. For example, the amount of results hanging out in the mysql database has pretty much doubled (growing slowly but consistently over the past two months), which is causing minor indigestion: the database backups and re-orgs take much longer, and workunits and results are hanging out on disk much longer (and filling up their respective disks). But also some power users are trying to return hundreds, perhaps thousands, of results in a single scheduler request. This last thing was an issue because these requests were failing due to an apache request-limit-size bottleneck, and then the scheduler itself would barf on it. Well, the thing is, up until this week the scheduler had been running on anakin - one of the last few 32-bit machines in our closet. A new scheduler was built and tested to work on 64-bit systems. Long story short, this week we moved the scheduler onto bane, which was an under-utilized 64-bit machine just handling one half of the workunit downloads. And moved bane's downloads onto anakin. This was done via ip address swapping, so no worries about DNS rollout. We'll try this out either today, or when we open the floodgates tomorrow. By the way, we're looking into the "ghost" issue. That might explain the aforementioned "result indigestion" or at least part of it.

Also the boinc.berkeley.edu server has been suffering from OS rot, getting hit by several simultaneous web spiders, and just plain getting outdated and outgrown. It has served us well, but we finally bit the bullet and moved all that functionality to a newer, faster, better system and so far so good.

Fairly soon I'm going to blow away the current filesystems on bambi now that marvin is the trusted Astropulse database server. This should be quick, though I expect some snags (we had trouble before on this system having the BIOS recognize the 3ware RAID volumes as bootable drives). Once that's done we'll start moving all of bruno's functionality to bambi, and finally retire bruno (another flailing, troublesome 32-bit machine).

We're still trying to nail down the exact specs of the new science database server - Jeff has been doing some additional research regarding CPU upgrades - but that'll get purchased really really soon I swear.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1035273 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1035275 - Posted: 23 Sep 2010, 18:15:30 UTC

Thanks for the news Matt. Sounds like good things on the way.

ID: 1035275 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1035277 - Posted: 23 Sep 2010, 18:19:39 UTC

Thank you for the updated info, Matt.
As is probably usual, more going on behind the scenes than some may realize.
Hope all the shuffling around of hardware proves to be the answer to some problems.

Anxiously awaiting new news on Oscar.

Meow meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1035277 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1035280 - Posted: 23 Sep 2010, 18:22:50 UTC

Looks like you will have to update the server status page again as well.

Thanks for the update Matt.

ID: 1035280 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1035282 - Posted: 23 Sep 2010, 18:26:54 UTC - in response to Message 1035273.  

May all the transitions be smooth and painless and may Oscar be the bestest server ever!! Hope we got you all you need to cover Oscar, I'm not sure Kittyman could live through another fund drive! :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 1035282 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1035295 - Posted: 23 Sep 2010, 19:06:16 UTC - in response to Message 1035273.  
Last modified: 23 Sep 2010, 19:07:55 UTC

Thanks for the update Matt,

Can you, along with Eric, Jeff and David, work out how get an Astropulse_v505 switch implemented, along with changing the scheduler messages to not mention Astropulse_v5,
Eric posted he had'nt realised the Astropulse_v505 switch hadn't been implemented in this post last September, thanks.

Claggy
ID: 1035295 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1035309 - Posted: 23 Sep 2010, 19:27:35 UTC - in response to Message 1035273.  

Matt, thanks for the news!

ID: 1035309 · Report as offensive
ToxicTBag

Send message
Joined: 5 Feb 10
Posts: 101
Credit: 57,197,902
RAC: 0
United Kingdom
Message 1035375 - Posted: 23 Sep 2010, 21:58:45 UTC

Thanks for the update Matt much appreciated!
ID: 1035375 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30975
Credit: 53,134,872
RAC: 32
United States
Message 1035382 - Posted: 23 Sep 2010, 22:07:10 UTC

Thanks for the update.

ID: 1035382 · Report as offensive
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 1035442 - Posted: 24 Sep 2010, 2:56:28 UTC

Many thanks!

Join the PACK!
ID: 1035442 · Report as offensive
Pascal Meeuws

Send message
Joined: 25 Nov 09
Posts: 5
Credit: 1,380,836
RAC: 0
Netherlands
Message 1035541 - Posted: 24 Sep 2010, 11:39:29 UTC

Thanks Matt
It's 100% certain. There is no intelligent life in this universe.
ID: 1035541 · Report as offensive
Profile BakCompat
Avatar

Send message
Joined: 30 Jun 00
Posts: 7
Credit: 5,017,546
RAC: 0
United States
Message 1035680 - Posted: 24 Sep 2010, 17:35:03 UTC

That's great news! I know I'm looking forward to see the upgraded hardware smooth things out in the system. Good work there.
ID: 1035680 · Report as offensive
Profile bloodrain
Volunteer tester
Avatar

Send message
Joined: 8 Dec 08
Posts: 231
Credit: 28,112,547
RAC: 1
Antarctica
Message 1035773 - Posted: 24 Sep 2010, 21:05:43 UTC - in response to Message 1035680.  

thanks for the update.
ID: 1035773 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1035774 - Posted: 24 Sep 2010, 21:06:23 UTC

SETI@home crew, it's look like something is wrong with the scheduler..

Number crunching : 'Let the games begin 9-24'

ID: 1035774 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66284
Credit: 55,293,173
RAC: 49
United States
Message 1035812 - Posted: 24 Sep 2010, 22:38:16 UTC

The Hard Disk Drive is full again, Can't upload or report.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1035812 · Report as offensive
Profile Daniel R. Pratt
Avatar

Send message
Joined: 10 Apr 00
Posts: 1
Credit: 9,978,700
RAC: 63
United States
Message 1035898 - Posted: 25 Sep 2010, 3:06:23 UTC

Matt:

Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation?

Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill.

Keep up the good work, though.

WATERHOLE

Team Waterhole Administrator
ID: 1035898 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19362
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1035913 - Posted: 25 Sep 2010, 4:07:47 UTC - in response to Message 1035898.  

Matt:

Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation?

Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill.

Keep up the good work, though.

WATERHOLE

The "master file" is on the web server, so while these web pages can be accessed so can the "master file".
ID: 1035913 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1035921 - Posted: 25 Sep 2010, 4:42:55 UTC - in response to Message 1035898.  

Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets.

Uploading of completed Tasks and downloading of new unprocessed Tasks ARE done by separate servers. The problem is in the Scheduling Server and Scheduling Process, which receive Reports of completed Tasks and control the assignment of new work.

This past week they swapped server functions, and somehow something got bollixed so that the Scheduler is not responding to requests. Until that gets sorted out, we can upload completed Tasks, but cannot Report them, or get new Tasks.

You can read more about the server swaps in the 1st message of this thread, and more about the problems everyone is having, in threads in the Number Crunching section.

At the moment, I am at a CPU standstill.

Keep up the good work, though.

WATERHOLE

You are in good company. I run only CPU tasks on two old Mac G4s, and both are dry, waiting to report and get new Tasks. Many of us are dry or about to be. If you are not running another BOINC project as backup, this weekend might be a good time to shut down and do some seasonal cleaning of your rigs.

Around Seti@Home, patience is not just a virtue, it is a requirement.

Donald
Infernal Optimist / Submariner, retired
ID: 1035921 · Report as offensive
Dave

Send message
Joined: 20 Aug 00
Posts: 30
Credit: 1,868,638
RAC: 4
United Kingdom
Message 1036200 - Posted: 25 Sep 2010, 17:43:00 UTC

As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition.
ID: 1036200 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1036648 - Posted: 28 Sep 2010, 18:59:40 UTC - in response to Message 1036200.  
Last modified: 28 Sep 2010, 19:00:27 UTC

As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition.


No, all of the key people (Eric, Jeff, Matt, even Dr. Anderson) are employees of the UC Space Sciences Lab, and divide their time between S@H and other projects. They do spend a lot of their time keeping S@H running, including coming in on weekends and holidays to deal with casualties.

They ARE much appreciated by most of us, and that needs to be said more often.
Donald
Infernal Optimist / Submariner, retired
ID: 1036648 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Sforzando (Sep 23 2010)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.