Whoopsie (Aug 07 2007)

Message boards : Technical News : Whoopsie (Aug 07 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 616003 - Posted: 7 Aug 2007, 20:10:57 UTC

Well well well.. Our BOINC database server (the non-science server) decided to reboot itself yesterday afternoon, bringing mysql down with it in a rather unceremonious fashion. The sudden crash is still a mystery, but upon restart the mysql engine, as usual, did a good job cleaning up on its own. However this process is a bit slow and didn't complete until our (current) short staff was all at home. At this point it became clear our two scheduling servers (bruno and ptolemy) were hung up due to all this chaos and needed to be rebooted as well. While ptolemy came up cleanly, bruno did not and remained down all evening.

This morning I gave bruno a kick and it came up just fine. We then went through the usual Tuesday database compression/backup. Luckily we have a replica database, which was all caught up so it contained the last few updates that were lost on the master database. So I dropped and recreated the master using the more up-to-date replica before starting the projects back up again.

However, things are still operating at a crawl (to put it mildly). This may be due to missing indexes (that weren't on the replica so they didn't get recreated on the master). Expect some turbulence over the next 24 hours as we recover from this minor mishap.

Needless to say the new client release is postponed for the day, which is just as well as tomorrow will be the first time in weeks that me, Jeff, and Eric will be in the same room at the same time.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 616003 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 616087 - Posted: 7 Aug 2007, 22:44:29 UTC

Hey Matt,

Very nice job with getting things back up and running. And sorry to hear about multi-beam being put off a few days, but it will be worth it in the end. Thank you for the news again.

Jeremy
ID: 616087 · Report as offensive
wheelieslug
Volunteer tester

Send message
Joined: 8 Jul 03
Posts: 38
Credit: 3,688,407
RAC: 0
United Kingdom
Message 616101 - Posted: 7 Aug 2007, 22:53:59 UTC

Aah, there you are.

Nah worries. The impossible you can do at once, perfection takes a little longer :)

Thanks for taking the time to post - I've heard there's these little things called 'coffee' & 'rest'.....

Regards.
ID: 616101 · Report as offensive
Garry Webb

Send message
Joined: 25 Aug 99
Posts: 40
Credit: 13,561,408
RAC: 0
United States
Message 616121 - Posted: 7 Aug 2007, 23:13:56 UTC - in response to Message 616003.  
Last modified: 7 Aug 2007, 23:15:15 UTC

Well well well.. Our BOINC database server (the non-science server) decided to reboot itself yesterday afternoon, bringing mysql down with it in a rather unceremonious fashion. The sudden crash is still a mystery, but upon restart the mysql engine, as usual, did a good job cleaning up on its own. However this process is a bit slow and didn't complete until our (current) short staff was all at home. At this point it became clear our two scheduling servers (bruno and ptolemy) were hung up due to all this chaos and needed to be rebooted as well. While ptolemy came up cleanly, bruno did not and remained down all evening.

This morning I gave bruno a kick and it came up just fine. We then went through the usual Tuesday database compression/backup. Luckily we have a replica database, which was all caught up so it contained the last few updates that were lost on the master database. So I dropped and recreated the master using the more up-to-date replica before starting the projects back up again.

However, things are still operating at a crawl (to put it mildly). This may be due to missing indexes (that weren't on the replica so they didn't get recreated on the master). Expect some turbulence over the next 24 hours as we recover from this minor mishap.

Needless to say the new client release is postponed for the day, which is just as well as tomorrow will be the first time in weeks that me, Jeff, and Eric will be in the same room at the same time.

- Matt

It didn't crawl very long, about 15 minutes. Thanks to all for your efforts.
ID: 616121 · Report as offensive
Profile computerguy09
Volunteer tester
Avatar

Send message
Joined: 3 Aug 99
Posts: 80
Credit: 9,570,364
RAC: 3
United States
Message 616173 - Posted: 8 Aug 2007, 0:52:31 UTC

I'm still seeing WU DL problems as of right now. So some of the pipes didn't get totally unclogged. Uploads and reporting of results seem to go OK...
Mark

ID: 616173 · Report as offensive
Profile MoonFire

Send message
Joined: 6 May 99
Posts: 9
Credit: 1,725,064
RAC: 0
United States
Message 616187 - Posted: 8 Aug 2007, 1:35:11 UTC
Last modified: 8 Aug 2007, 1:36:44 UTC

I run a computer repair business and am all-too familiar with the aggrevations you described; want to borrow the steel-toed boot I keep in my trunk to help the process? Ahhh...."Murphy's Law(s): 24-7!"

= Zap me a new version after you've slept. ;)

>MoonFire TC:205216.25
ID: 616187 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 616245 - Posted: 8 Aug 2007, 6:07:37 UTC

Or better yet a Dog leg? ;)


http://www.stormtaylor.com/

A dog with an artificial foot and that's not all, It's not strapped on either. It can also be done for Humans eventually, Imagine this, If You had to have Your foot or such amputated one wouldn't have to be strapped on anymore, It would be a part of Ones body, Just as if one was born with It. :D I'd heard of this on CNN briefly a few days back and just decided to look for It.

http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=bonding+metal+to+dog+skin&btnG=Search
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 616245 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 616497 - Posted: 8 Aug 2007, 21:04:07 UTC

I don't think that we are done recovering from the crash. I noticed that one of the assimilators has failed, which has allowed a backlog of work units to assimilate to grow to over ten thousand work units. Will this backlog delay the new client?
ID: 616497 · Report as offensive
Profile KWSN - MajorKong
Volunteer tester
Avatar

Send message
Joined: 5 Jan 00
Posts: 2892
Credit: 1,499,890
RAC: 0
United States
Message 616504 - Posted: 8 Aug 2007, 21:15:36 UTC
Last modified: 8 Aug 2007, 21:18:15 UTC

Looks like the new client is installed for x86 windows and linux. I'm crunching with it now... See here http://setiathome.berkeley.edu/apps.php.


Can't wait for the Chicken to get released.
https://youtu.be/iY57ErBkFFE

#Texit

Don't blame me, I voted for Johnson(L) in 2016.

Truth is dangerous... especially when it challenges those in power.
ID: 616504 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 616569 - Posted: 8 Aug 2007, 22:32:47 UTC - in response to Message 616497.  

I don't think that we are done recovering from the crash. I noticed that one of the assimilators has failed, which has allowed a backlog of work units to assimilate to grow to over ten thousand work units. Will this backlog delay the new client?


Database/file status
State # As of*
Results ready to send 0 11m
Current result creation rate 2.27/sec 0m
Results in progress 1,262,581 11m
Workunits waiting for validation 8 11m
Workunits waiting for assimilation 12,842 11m
Workunits waiting for deletion 8 11m
Results waiting for deletion 5 11m
Transitioner backlog (hours) 0 50m
ID: 616569 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 616686 - Posted: 9 Aug 2007, 0:25:08 UTC - in response to Message 616504.  

Looks like the new client is installed for x86 windows and linux. I'm crunching with it now... See here http://setiathome.berkeley.edu/apps.php.


What about linux x86_64? Should I continue to run my optimized application? Will the new work units crash if BOINC downloads them? I'm running KWSN-R2.2B-64bit-SSE2-generic.
ID: 616686 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 616704 - Posted: 9 Aug 2007, 0:43:37 UTC - in response to Message 616686.  

Looks like the new client is installed for x86 windows and linux. I'm crunching with it now... See here http://setiathome.berkeley.edu/apps.php.


What about linux x86_64? Should I continue to run my optimized application? Will the new work units crash if BOINC downloads them? I'm running KWSN-R2.2B-64bit-SSE2-generic.


It's safe to continue using the optimized application and BOINC will not crash.


Join BOINC United now!
ID: 616704 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 616711 - Posted: 9 Aug 2007, 0:48:47 UTC - in response to Message 616704.  

Excellent! Thank you. I get 2x faster speeds with KWSN-R2.2B-64bit-SSE2-generic. :)
ID: 616711 · Report as offensive
Profile Dannis
Avatar

Send message
Joined: 29 Jan 06
Posts: 24
Credit: 11,065,028
RAC: 7
United States
Message 617094 - Posted: 9 Aug 2007, 17:37:30 UTC
Last modified: 9 Aug 2007, 17:39:56 UTC

First Let me Say that I am thankful for all the hard work that the staff has done. I know that it is difficult to work with pieced together equipment and limited staff numbers. I also understand about capital cost. I have been an Electrical Engineer and retired as Global Technical Manager for a top 100 global company.

I don't wish to complain, but I have spent more time out of seti work than with the last four months. I prefer to work on seti. I have not experienced problems with the other projects.

Is there anything that we as users can do to help get you good equipment. Would email to the right people grease the wheel. Would a capital donation drive to be used only for site computer upgrades help? Maybe the top managers of SETI don't understand how many dollars of computer time is donated every day just in terms of energy dollars by users. I have calculated that I donate as much as 25 to 30 dollars a month in power to the project. I am only one small scale user, just image the total monthly donation.

I fear that we will lose many of our users to other projects, due to frustration. I know of at least two.

Let us know what you think can be done. Also thanks agian for your personal super efforts.
ID: 617094 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 617101 - Posted: 9 Aug 2007, 17:59:39 UTC

Yes, they probably know. There have been discussions on extra electrical energy usage. Each machine probably uses about 200 watts of power and the managers I'm sure are aware of total energy used because they know the number of active users, number of computers per user and the number of hours each machine devotes to Seti/Boinc each day. But there's no complaint. Everything is voluntary. Doing Seti helps keep my spirits up even if it costs me something like eight kilowatthours per day for my two PD950s.
ID: 617101 · Report as offensive
Profile Dannis
Avatar

Send message
Joined: 29 Jan 06
Posts: 24
Credit: 11,065,028
RAC: 7
United States
Message 617317 - Posted: 10 Aug 2007, 0:38:52 UTC - in response to Message 617101.  

Yes, they probably know. There have been discussions on extra electrical energy usage. Each machine probably uses about 200 watts of power and the managers I'm sure are aware of total energy used because they know the number of active users, number of computers per user and the number of hours each machine devotes to Seti/Boinc each day. But there's no complaint. Everything is voluntary. Doing Seti helps keep my spirits up even if it costs me something like eight kilowatthours per day for my two PD950s.


I know that it is voluntary. My purpose in bringing it up is that with all the downtime, we may lose members. And to also point out that we might could have some sort of membership donation drive for capital for new equipment. Since we are willing to spend the money on energy cost would we not also be willing to donate a few dollars each for new server equipment.
ID: 617317 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 617494 - Posted: 10 Aug 2007, 11:42:57 UTC - in response to Message 617317.  

Yes, they probably know. There have been discussions on extra electrical energy usage. Each machine probably uses about 200 watts of power and the managers I'm sure are aware of total energy used because they know the number of active users, number of computers per user and the number of hours each machine devotes to Seti/Boinc each day. But there's no complaint. Everything is voluntary. Doing Seti helps keep my spirits up even if it costs me something like eight kilowatthours per day for my two PD950s.


I know that it is voluntary. My purpose in bringing it up is that with all the downtime, we may lose members. And to also point out that we might could have some sort of membership donation drive for capital for new equipment. Since we are willing to spend the money on energy cost would we not also be willing to donate a few dollars each for new server equipment.


See this thread and this one as well.
ID: 617494 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 617560 - Posted: 10 Aug 2007, 16:42:11 UTC
Last modified: 10 Aug 2007, 16:45:23 UTC

SetiAdmin is always having those goddamn server breakdowns which cause all the downtime. I've suffered as well, crunching Einstein in lieu, and getting a lower RAC. A dollar contribution to Seti will probably go a lot further than that dollar toward that Core-2 Quad.

It looks like (I guess) the new Seticruncher for Multibeam data has automatically installed itself on many machines. Why? because I saw about a dozen consecutive short-duration units on one of my results pages (for one of my PD950s with Boinc 5.4.11) that were granted only about 85 percent of what I claimed. I've never seen this before. I guess that one can expect lower RACs from now on, probably to level the playing field with other Boinc projects.
ID: 617560 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 617612 - Posted: 10 Aug 2007, 19:19:22 UTC - in response to Message 617560.  

SetiAdmin is always having those goddamn server breakdowns which cause all the downtime. I've suffered as well, crunching Einstein in lieu, and getting a lower RAC. A dollar contribution to Seti will probably go a lot further than that dollar toward that Core-2 Quad.

It looks like (I guess) the new Seticruncher for Multibeam data has automatically installed itself on many machines. Why? because I saw about a dozen consecutive short-duration units on one of my results pages (for one of my PD950s with Boinc 5.4.11) that were granted only about 85 percent of what I claimed. I've never seen this before. I guess that one can expect lower RACs from now on, probably to level the playing field with other Boinc projects.


Yes, every time the stock client becomes more efficient, they lower the credit. Good for the science/bad for the crd/hr.
ID: 617612 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 617632 - Posted: 10 Aug 2007, 20:21:36 UTC - in response to Message 617612.  

SetiAdmin is always having those goddamn server breakdowns which cause all the downtime. I've suffered as well, crunching Einstein in lieu, and getting a lower RAC. A dollar contribution to Seti will probably go a lot further than that dollar toward that Core-2 Quad.

It looks like (I guess) the new Seticruncher for Multibeam data has automatically installed itself on many machines. Why? because I saw about a dozen consecutive short-duration units on one of my results pages (for one of my PD950s with Boinc 5.4.11) that were granted only about 85 percent of what I claimed. I've never seen this before. I guess that one can expect lower RACs from now on, probably to level the playing field with other Boinc projects.


Yes, every time the stock client becomes more efficient, they lower the credit. Good for the science/bad for the crd/hr.

In this case, the new app is about 25% faster and they've so far lowered the credit per WU by only 15%. The net result can be expected to be about 10% higher credit/time for Line feed WUs. The Multibeam WUs may drop it down to about the same credit/time as the old app on Line feed, but it will take some extended time to average out.
                                                                  Joe
ID: 617632 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Whoopsie (Aug 07 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.