Message boards :
Technical News :
Sforzando (Sep 23 2010)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Sorry about the extended two-day website brown-out just now. The mysql database server crashed during the "re-org," so that had to be restarted, then it crashed *again*. We didn't get a successful backup out of the thing until last night. That's a little bit annoying, and a little bit worrisome. Let's see.. it's been a while since I put forth a litany of server issues. Except for the a/c debacle last week everything has been more or less status quo, but this week there was extra shuffling. Allow me to elaborate: There have been some interesting unexpected consequences due to these extended weekly outages. For example, the amount of results hanging out in the mysql database has pretty much doubled (growing slowly but consistently over the past two months), which is causing minor indigestion: the database backups and re-orgs take much longer, and workunits and results are hanging out on disk much longer (and filling up their respective disks). But also some power users are trying to return hundreds, perhaps thousands, of results in a single scheduler request. This last thing was an issue because these requests were failing due to an apache request-limit-size bottleneck, and then the scheduler itself would barf on it. Well, the thing is, up until this week the scheduler had been running on anakin - one of the last few 32-bit machines in our closet. A new scheduler was built and tested to work on 64-bit systems. Long story short, this week we moved the scheduler onto bane, which was an under-utilized 64-bit machine just handling one half of the workunit downloads. And moved bane's downloads onto anakin. This was done via ip address swapping, so no worries about DNS rollout. We'll try this out either today, or when we open the floodgates tomorrow. By the way, we're looking into the "ghost" issue. That might explain the aforementioned "result indigestion" or at least part of it. Also the boinc.berkeley.edu server has been suffering from OS rot, getting hit by several simultaneous web spiders, and just plain getting outdated and outgrown. It has served us well, but we finally bit the bullet and moved all that functionality to a newer, faster, better system and so far so good. Fairly soon I'm going to blow away the current filesystems on bambi now that marvin is the trusted Astropulse database server. This should be quick, though I expect some snags (we had trouble before on this system having the BIOS recognize the 3ware RAID volumes as bootable drives). Once that's done we'll start moving all of bruno's functionality to bambi, and finally retire bruno (another flailing, troublesome 32-bit machine). We're still trying to nail down the exact specs of the new science database server - Jeff has been doing some additional research regarding CPU upgrades - but that'll get purchased really really soon I swear. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 |
Thanks for the news Matt. Sounds like good things on the way. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Thank you for the updated info, Matt. As is probably usual, more going on behind the scenes than some may realize. Hope all the shuffling around of hardware proves to be the answer to some problems. Anxiously awaiting new news on Oscar. Meow meow. "Time is simply the mechanism that keeps everything from happening all at once." |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
May all the transitions be smooth and painless and may Oscar be the bestest server ever!! Hope we got you all you need to cover Oscar, I'm not sure Kittyman could live through another fund drive! :-) PROUD MEMBER OF Team Starfire World BOINC |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Can you, along with Eric, Jeff and David, work out how get an Astropulse_v505 switch implemented, along with changing the scheduler messages to not mention Astropulse_v5, Eric posted he had'nt realised the Astropulse_v505 switch hadn't been implemented in this post last September, thanks. Claggy |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Matt, thanks for the news! |
ToxicTBag Send message Joined: 5 Feb 10 Posts: 101 Credit: 57,197,902 RAC: 0 |
Thanks for the update Matt much appreciated! |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30927 Credit: 53,134,872 RAC: 32 |
Thanks for the update. |
rebest Send message Joined: 16 Apr 00 Posts: 1296 Credit: 45,357,093 RAC: 0 |
|
Pascal Meeuws Send message Joined: 25 Nov 09 Posts: 5 Credit: 1,380,836 RAC: 0 |
Thanks Matt It's 100% certain. There is no intelligent life in this universe. |
BakCompat Send message Joined: 30 Jun 00 Posts: 7 Credit: 5,017,546 RAC: 0 |
That's great news! I know I'm looking forward to see the upgraded hardware smooth things out in the system. Good work there. |
bloodrain Send message Joined: 8 Dec 08 Posts: 231 Credit: 28,112,547 RAC: 1 |
thanks for the update. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
SETI@home crew, it's look like something is wrong with the scheduler.. Number crunching : 'Let the games begin 9-24' |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66203 Credit: 55,293,173 RAC: 49 |
The Hard Disk Drive is full again, Can't upload or report. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
Daniel R. Pratt Send message Joined: 10 Apr 00 Posts: 1 Credit: 9,978,700 RAC: 63 |
Matt: Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation? Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill. Keep up the good work, though. WATERHOLE Team Waterhole Administrator |
W-K 666 Send message Joined: 18 May 99 Posts: 19310 Credit: 40,757,560 RAC: 67 |
Matt: The "master file" is on the web server, so while these web pages can be accessed so can the "master file". |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. Uploading of completed Tasks and downloading of new unprocessed Tasks ARE done by separate servers. The problem is in the Scheduling Server and Scheduling Process, which receive Reports of completed Tasks and control the assignment of new work. This past week they swapped server functions, and somehow something got bollixed so that the Scheduler is not responding to requests. Until that gets sorted out, we can upload completed Tasks, but cannot Report them, or get new Tasks. You can read more about the server swaps in the 1st message of this thread, and more about the problems everyone is having, in threads in the Number Crunching section. At the moment, I am at a CPU standstill. You are in good company. I run only CPU tasks on two old Mac G4s, and both are dry, waiting to report and get new Tasks. Many of us are dry or about to be. If you are not running another BOINC project as backup, this weekend might be a good time to shut down and do some seasonal cleaning of your rigs. Around Seti@Home, patience is not just a virtue, it is a requirement. Donald Infernal Optimist / Submariner, retired |
Dave Send message Joined: 20 Aug 00 Posts: 30 Credit: 1,868,638 RAC: 4 |
As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition. |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition. No, all of the key people (Eric, Jeff, Matt, even Dr. Anderson) are employees of the UC Space Sciences Lab, and divide their time between S@H and other projects. They do spend a lot of their time keeping S@H running, including coming in on weekends and holidays to deal with casualties. They ARE much appreciated by most of us, and that needs to be said more often. Donald Infernal Optimist / Submariner, retired |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.