Sforzando (Sep 23 2010)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 1035273 - Posted: 23 Sep 2010, 18:10:36 UTC Sorry about the extended two-day website brown-out just now. The mysql database server crashed during the "re-org," so that had to be restarted, then it crashed again. We didn't get a successful backup out of the thing until last night. That's a little bit annoying, and a little bit worrisome. Let's see.. it's been a while since I put forth a litany of server issues. Except for the a/c debacle last week everything has been more or less status quo, but this week there was extra shuffling. Allow me to elaborate: There have been some interesting unexpected consequences due to these extended weekly outages. For example, the amount of results hanging out in the mysql database has pretty much doubled (growing slowly but consistently over the past two months), which is causing minor indigestion: the database backups and re-orgs take much longer, and workunits and results are hanging out on disk much longer (and filling up their respective disks). But also some power users are trying to return hundreds, perhaps thousands, of results in a single scheduler request. This last thing was an issue because these requests were failing due to an apache request-limit-size bottleneck, and then the scheduler itself would barf on it. Well, the thing is, up until this week the scheduler had been running on anakin - one of the last few 32-bit machines in our closet. A new scheduler was built and tested to work on 64-bit systems. Long story short, this week we moved the scheduler onto bane, which was an under-utilized 64-bit machine just handling one half of the workunit downloads. And moved bane's downloads onto anakin. This was done via ip address swapping, so no worries about DNS rollout. We'll try this out either today, or when we open the floodgates tomorrow. By the way, we're looking into the "ghost" issue. That might explain the aforementioned "result indigestion" or at least part of it. Also the boinc.berkeley.edu server has been suffering from OS rot, getting hit by several simultaneous web spiders, and just plain getting outdated and outgrown. It has served us well, but we finally bit the bullet and moved all that functionality to a newer, faster, better system and so far so good. Fairly soon I'm going to blow away the current filesystems on bambi now that marvin is the trusted Astropulse database server. This should be quick, though I expect some snags (we had trouble before on this system having the BIOS recognize the 3ware RAID volumes as bootable drives). Once that's done we'll start moving all of bruno's functionality to bambi, and finally retire bruno (another flailing, troublesome 32-bit machine). We're still trying to nail down the exact specs of the new science database server - Jeff has been doing some additional research regarding CPU upgrades - but that'll get purchased really really soon I swear. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 1035273 ·

Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0	Message 1035275 - Posted: 23 Sep 2010, 18:15:30 UTC Thanks for the news Matt. Sounds like good things on the way. ID: 1035275 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51470 Credit: 1,018,363,574 RAC: 1,004	Message 1035277 - Posted: 23 Sep 2010, 18:19:39 UTC Thank you for the updated info, Matt. As is probably usual, more going on behind the scenes than some may realize. Hope all the shuffling around of hardware proves to be the answer to some problems. Anxiously awaiting new news on Oscar. Meow meow. "Time is simply the mechanism that keeps everything from happening all at once." ID: 1035277 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1035280 - Posted: 23 Sep 2010, 18:22:50 UTC Looks like you will have to update the server status page again as well. Thanks for the update Matt. ID: 1035280 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1035282 - Posted: 23 Sep 2010, 18:26:54 UTC - in response to Message 1035273. May all the transitions be smooth and painless and may Oscar be the bestest server ever!! Hope we got you all you need to cover Oscar, I'm not sure Kittyman could live through another fund drive! :-) PROUD MEMBER OF Team Starfire World BOINC ID: 1035282 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1035295 - Posted: 23 Sep 2010, 19:06:16 UTC - in response to Message 1035273. Last modified: 23 Sep 2010, 19:07:55 UTC Thanks for the update Matt, Can you, along with Eric, Jeff and David, work out how get an Astropulse_v505 switch implemented, along with changing the scheduler messages to not mention Astropulse_v5, Eric posted he had'nt realised the Astropulse_v505 switch hadn't been implemented in this post last September, thanks. Claggy ID: 1035295 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1035309 - Posted: 23 Sep 2010, 19:27:35 UTC - in response to Message 1035273. Matt, thanks for the news! ID: 1035309 ·

ToxicTBag Send message Joined: 5 Feb 10 Posts: 101 Credit: 57,197,902 RAC: 0	Message 1035375 - Posted: 23 Sep 2010, 21:58:45 UTC Thanks for the update Matt much appreciated! ID: 1035375 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 1035382 - Posted: 23 Sep 2010, 22:07:10 UTC Thanks for the update. ID: 1035382 ·

rebest Volunteer tester Send message Joined: 16 Apr 00 Posts: 1296 Credit: 45,357,093 RAC: 0	Message 1035442 - Posted: 24 Sep 2010, 2:56:28 UTC Many thanks! Join the PACK! ID: 1035442 ·

Pascal Meeuws Send message Joined: 25 Nov 09 Posts: 5 Credit: 1,380,836 RAC: 0	Message 1035541 - Posted: 24 Sep 2010, 11:39:29 UTC Thanks Matt It's 100% certain. There is no intelligent life in this universe. ID: 1035541 ·

BakCompat Send message Joined: 30 Jun 00 Posts: 7 Credit: 5,017,546 RAC: 0	Message 1035680 - Posted: 24 Sep 2010, 17:35:03 UTC That's great news! I know I'm looking forward to see the upgraded hardware smooth things out in the system. Good work there. ID: 1035680 ·

bloodrain Volunteer tester Send message Joined: 8 Dec 08 Posts: 231 Credit: 28,112,547 RAC: 1	Message 1035773 - Posted: 24 Sep 2010, 21:05:43 UTC - in response to Message 1035680. thanks for the update. ID: 1035773 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1035774 - Posted: 24 Sep 2010, 21:06:23 UTC SETI@home crew, it's look like something is wrong with the scheduler.. Number crunching : 'Let the games begin 9-24' ID: 1035774 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 66001 Credit: 55,293,173 RAC: 49	Message 1035812 - Posted: 24 Sep 2010, 22:38:16 UTC The Hard Disk Drive is full again, Can't upload or report. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 1035812 ·

Daniel R. Pratt Send message Joined: 10 Apr 00 Posts: 1 Credit: 9,978,700 RAC: 63	Message 1035898 - Posted: 25 Sep 2010, 3:06:23 UTC Matt: Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation? Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill. Keep up the good work, though. WATERHOLE Team Waterhole Administrator ID: 1035898 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19227 Credit: 40,757,560 RAC: 67	Message 1035913 - Posted: 25 Sep 2010, 4:07:47 UTC - in response to Message 1035898. Matt: Just curious-- as if you don't have enough to do putting out server fires-- how is it possible to have successfully downloaded a "master file" when the status returns indicate the servers are done? Are these files stored elsewhere in the system, or is my success just due to sporadic server operation? Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. At the moment, I am at a CPU standstill. Keep up the good work, though. WATERHOLE The "master file" is on the web server, so while these web pages can be accessed so can the "master file". ID: 1035913 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1035921 - Posted: 25 Sep 2010, 4:42:55 UTC - in response to Message 1035898. Have you ever considered separate servers for dishing out unprocessed packets versus those for receiving processed packets? At least when the receiving servers failed, we'd still be able to retrieve new packets. Uploading of completed Tasks and downloading of new unprocessed Tasks ARE done by separate servers. The problem is in the Scheduling Server and Scheduling Process, which receive Reports of completed Tasks and control the assignment of new work. This past week they swapped server functions, and somehow something got bollixed so that the Scheduler is not responding to requests. Until that gets sorted out, we can upload completed Tasks, but cannot Report them, or get new Tasks. You can read more about the server swaps in the 1st message of this thread, and more about the problems everyone is having, in threads in the Number Crunching section. At the moment, I am at a CPU standstill. Keep up the good work, though. WATERHOLE You are in good company. I run only CPU tasks on two old Mac G4s, and both are dry, waiting to report and get new Tasks. Many of us are dry or about to be. If you are not running another BOINC project as backup, this weekend might be a good time to shut down and do some seasonal cleaning of your rigs. Around Seti@Home, patience is not just a virtue, it is a requirement. Donald Infernal Optimist / Submariner, retired ID: 1035921 ·

Dave Send message Joined: 20 Aug 00 Posts: 30 Credit: 1,868,638 RAC: 4	Message 1036200 - Posted: 25 Sep 2010, 17:43:00 UTC As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition. ID: 1036200 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1036648 - Posted: 28 Sep 2010, 18:59:40 UTC - in response to Message 1036200. Last modified: 28 Sep 2010, 19:00:27 UTC As far as I am aware the people who look after Seti@Home are volunteers. I am amazed at what they manage to do in coping with and solving what must be highly stessful situations when there are problems. They all deserve some good recognition. No, all of the key people (Eric, Jeff, Matt, even Dr. Anderson) are employees of the UC Space Sciences Lab, and divide their time between S@H and other projects. They do spend a lot of their time keeping S@H running, including coming in on weekends and holidays to deal with casualties. They ARE much appreciated by most of us, and that needs to be said more often. Donald Infernal Optimist / Submariner, retired ID: 1036648 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.