Retreat! (Jun 24 2009)


log in

Advanced search

Message boards : Technical News : Retreat! (Jun 24 2009)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1391
Credit: 74,079
RAC: 10
United States
Message 910837 - Posted: 24 Jun 2009, 19:56:44 UTC

Despite efforts to reduce the outage time yesterday, the database was bloated enough (for various reasons) to take all day compressing/backing up. The replica wasn't even close to being ready to done by the time I left the lab, and still wasn't done before I went to bed last night. That meant all queries had to be aimed at the master, including all the read-only stuff that usually hits the replica - stats collection scripts, result state count scripts, the daily credit multiplier calculation (which is rather expensive), and lots of annoying web scraping queries.

All those excess things pretty much killed us throughout the evening. The replica was finally available in the morning, albeit fairly far behind the master. Nevertheless I was able to start cleaning up the mess. However, two other problems were revealed.

First, going to one download server wasn't a good thing. It seems impossible to me that apache can't handle all the downloads on one system - especially given the abundance of free resources. It drops connections regardless of how much network/httpd.conf tweaking I do. So we fell back to using two download servers, and that immediately solved everything. Of course, we've been offline for 24 hours, so there's gonna be lots of traffic for a while making it hard to upload/download anything.

Second, there was minor corruption in the MyISAM tables in the mysql database. Not sure what caused that but given the database was clogged all night all bets are off. The most notable effect of this was some weird behavior in the forums. Some simple "repair table" commands found the problems and claims to have fixed them.

Anyway.. it's clear we still have much work to do cleaning up our current mysql situation. Sigh.

In better news, looks like me and Jeff are going to the OSCON 2009 in San Jose in July - the O'Reilly open source convention. Maybe we'll get some hot tips about improving the linux/apache/mysql/php performance around here. Tim O'Reilly himself helped hook us up with free passes (he's been nice to us over the years).

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Infomage
Send message
Joined: 28 Dec 00
Posts: 1
Credit: 503,964
RAC: 0
United Kingdom
Message 910838 - Posted: 24 Jun 2009, 19:59:56 UTC - in response to Message 910837.

Glad to know that things will be back up and running soon. :)
____________

aplayer
Send message
Joined: 26 Apr 00
Posts: 13
Credit: 12,618,297
RAC: 0
United States
Message 910994 - Posted: 25 Jun 2009, 0:53:14 UTC - in response to Message 910837.

thank you.... things may start improving soon.l good to hear....

David
Send message
Joined: 2 Jun 08
Posts: 3
Credit: 268,609
RAC: 0
United States
Message 911055 - Posted: 25 Jun 2009, 3:02:42 UTC

Ok, the down times are getting old. I prefered to have only one project running and it appears this is not the one to to maximize my computers when they are not in use. Guess it is time to find a different and more productive project, it has been a fun run.

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13185
Credit: 7,941,520
RAC: 15,309
United States
Message 911098 - Posted: 25 Jun 2009, 6:26:41 UTC - in response to Message 911055.

Ok, the down times are getting old. I prefered to have only one project running and it appears this is not the one to to maximize my computers when they are not in use. Guess it is time to find a different and more productive project, it has been a fun run.

No SINGLE project can do that. I've got 10 attached and all of them have had extended down times. It is the nature of the beast.

____________

Profile Jack Zhang
Volunteer tester
Avatar
Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,142,449
RAC: 154
Canada
Message 911123 - Posted: 25 Jun 2009, 7:56:34 UTC

Downloads have been on the frisk for quite a while, even after the re-introduction of 2 download servers. It's symptoms are similar to the problem when uploads are maxed out.
____________
What if Fiction was Fact and Fact was Fiction and vice versa?

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 645
Credit: 147,694,993
RAC: 47,495
United Kingdom
Message 911133 - Posted: 25 Jun 2009, 8:57:25 UTC - in response to Message 911123.

Downloads have been on the frisk for quite a while, even after the re-introduction of 2 download servers. It's symptoms are similar to the problem when uploads are maxed out.


Have you tried stopping BOINC and flushing your DNS cache? I seemed to have
a stale address in several of my machines this morning and enormous download
backlogs; flushing the cache cured the problem.
____________

Administrator
Send message
Joined: 2 Oct 06
Posts: 2
Credit: 203,504
RAC: 0
Australia
Message 911169 - Posted: 25 Jun 2009, 11:41:02 UTC

Is this downtime a common theme with Seti@home now?

After been inactive for a few years I decided to join back up after I received an email from the Seti team. My client has been running for 48 hours now, half of which it has been down.

Then I come here and see from the threads listed that there has been numerous amounts of downtime due to glitches.

Have I made a mistake by rejoining?

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5946
Credit: 62,418,322
RAC: 39,120
Australia
Message 911176 - Posted: 25 Jun 2009, 11:59:20 UTC - in response to Message 911169.

Is this downtime a common theme with Seti@home now?

No more than before.
Like most things you tend to get it in groups.

Have I made a mistake by rejoining?

Can't see how.
____________
Grant
Darwin NT.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12471
Credit: 2,692,593
RAC: 1,252
Netherlands
Message 911179 - Posted: 25 Jun 2009, 12:07:30 UTC

Before someone says it was all better in Classic times (I'm just waiting for it... and am heading you off. ;-)), remember that they had these down times in those days as well, there was just less communication about it.

All the work out there was then just recycled over and over and over and over - up to 50 times over - if at least you were connected to one of the servers doing all that recycling. And else your Seti program wasn't doing anything either.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile tullioProject donor
Send message
Joined: 9 Apr 04
Posts: 3869
Credit: 396,465
RAC: 196
Italy
Message 911201 - Posted: 25 Jun 2009, 12:54:50 UTC - in response to Message 911192.

Yes, and many associate SETI@home with the SETI Institute. I had to explain to an Italian newspaper that it is not so, but I saw that Scientific American made the same mistake in an article promoting Docking@home.
Tullio
____________

Profile David @ TPS
Send message
Joined: 30 Sep 04
Posts: 70
Credit: 11,323,275
RAC: 0
United States
Message 911220 - Posted: 25 Jun 2009, 13:46:30 UTC - in response to Message 911192.

It's all about timing. Rejoining in the middle of an outage would be unnerving to say the least. Give it some time to heal, and all will be well until the next hiccup.

Seti is my primary project, but I have others in reserve if it drops off for a while.

As was said above S*** Happens, AND USUALLY AT THE MOST INAPPROPRIATE TIME. My 10 day cache's have weathered everything I have encountered so far as far as outages, and letting other projects run helps too. Sure my RAC has dropped a few thousand, but it will be back! (lost a quad as well!)

(no, I am not the poster above with the same username)

Give Matt and the guys some time to sort it all out.

____________

Administrator
Send message
Joined: 2 Oct 06
Posts: 2
Credit: 203,504
RAC: 0
Australia
Message 911234 - Posted: 25 Jun 2009, 14:16:07 UTC - in response to Message 911192.

BMgoau: Um yes I did donate, the day I rejoined actually, 48 hours ago. The sum I donated is none of your concern.

Secondly, only returning 48 hours ago, it's a bit hard to go through all the technical news posts.

Thirdly, you are right, I have no idea of the "perfect storm" right now. Would you care to elaborate? No I thought so, it would take too long one would assume.

I acknowledge Seti@home is run on grants and donations, that's why I donated when I rejoined, like I used to periodically when I used to run it years ago before this BOINC!

I'm not putting down, or having a go at the team at Seti, they do a terrific job. I was only asking or really inquiring about these downtimes, as I don't recall there being as many.

Now please step off of your soapbox.

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1624
Credit: 22,607,707
RAC: 4,278
United States
Message 911245 - Posted: 25 Jun 2009, 14:55:51 UTC

Perhaps the question to ask is why the glitch rate remains (apparently) high after all this time (years). It appears to those of us who know nothing that things are tweaked frequently and almost as frequently the first tweak begets a second tweak or an untweak, and so on. Yet after I don't know how many years the data we've processed remains un-analyzed. It is very frustrating.

My remedy has been to connect only in the wee hours of each Berkeley evening and keep a 10 day cache, so as to minimize the server chaos impacting my hosts' productivity. I also try to down shift my attitude before reading these boards so that I don't emotionally red line. Afterall, it is a 'hobby'. And, I think I'm going to turn off some of my old beasts for a while or forever; unlike the original premise of s@h, they are probably just adding more thermodynamic entropy to the universe than they can justify with seti "science"

Profile lostrego
Send message
Joined: 18 Mar 03
Posts: 1
Credit: 718,070
RAC: 711
Spain
Message 911249 - Posted: 25 Jun 2009, 15:21:54 UTC

Just a simple hint:

Restarting the BOINC client seems to temporally solve the problem, at least for me.

I'm crunching now again with my old piece o junk ;-)

StickProject donor
Volunteer tester
Send message
Joined: 26 Feb 00
Posts: 94
Credit: 1,744,488
RAC: 660
United States
Message 911256 - Posted: 25 Jun 2009, 15:43:37 UTC - in response to Message 911249.

Just a simple hint:

Restarting the BOINC client seems to temporally solve the problem, at least for me.

I'm crunching now again with my old piece o junk ;-)


Me too. All my stuck transfers cleared immediately. Thanks!
____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4244
Credit: 34,961,087
RAC: 22,305
United Kingdom
Message 911258 - Posted: 25 Jun 2009, 15:51:55 UTC - in response to Message 911256.

Restarted Boinc last night and all my downloads started then.

Claggy

PeterD
Send message
Joined: 2 Jan 01
Posts: 3
Credit: 18,283
RAC: 0
Canada
Message 911260 - Posted: 25 Jun 2009, 15:56:20 UTC

Cleared my browser cache and restarted BOINC and everything is downloading properly now.

Thx for the info
____________

Profile Stephen Motuel
Send message
Joined: 18 May 99
Posts: 1
Credit: 150,082
RAC: 83
Germany
Message 911316 - Posted: 25 Jun 2009, 17:41:05 UTC

Well I'll be a monkys uncle! I 've been waiting for Bionic to download 3 new jobs for the past 3 days with some silly messages like wrong size, server down and so on... now I know! just Exit Bionic restart Bionic and HELLO!!!! everything is alright! there seems to be nothing really wrong with seti (breath easier guys) it seem the problem lays with Bionic! I may be wrong here but what the hell...
____________

Bounce
Send message
Joined: 3 Apr 99
Posts: 66
Credit: 5,604,569
RAC: 0
United States
Message 911325 - Posted: 25 Jun 2009, 18:14:08 UTC - in response to Message 910837.

>O'Reilly open source convention

WOW! Seti/OpenSource are sponsoring an O'Reily race car? Kewl! Does it run really, really fast but require a complete engine replacement every lap?
____________

1 · 2 · Next

Message boards : Technical News : Retreat! (Jun 24 2009)

Copyright © 2014 University of California