Retreat! (Jun 24 2009)

Message boards : Technical News : Retreat! (Jun 24 2009)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 910837 - Posted: 24 Jun 2009, 19:56:44 UTC

Despite efforts to reduce the outage time yesterday, the database was bloated enough (for various reasons) to take all day compressing/backing up. The replica wasn't even close to being ready to done by the time I left the lab, and still wasn't done before I went to bed last night. That meant all queries had to be aimed at the master, including all the read-only stuff that usually hits the replica - stats collection scripts, result state count scripts, the daily credit multiplier calculation (which is rather expensive), and lots of annoying web scraping queries.

All those excess things pretty much killed us throughout the evening. The replica was finally available in the morning, albeit fairly far behind the master. Nevertheless I was able to start cleaning up the mess. However, two other problems were revealed.

First, going to one download server wasn't a good thing. It seems impossible to me that apache can't handle all the downloads on one system - especially given the abundance of free resources. It drops connections regardless of how much network/httpd.conf tweaking I do. So we fell back to using two download servers, and that immediately solved everything. Of course, we've been offline for 24 hours, so there's gonna be lots of traffic for a while making it hard to upload/download anything.

Second, there was minor corruption in the MyISAM tables in the mysql database. Not sure what caused that but given the database was clogged all night all bets are off. The most notable effect of this was some weird behavior in the forums. Some simple "repair table" commands found the problems and claims to have fixed them.

Anyway.. it's clear we still have much work to do cleaning up our current mysql situation. Sigh.

In better news, looks like me and Jeff are going to the OSCON 2009 in San Jose in July - the O'Reilly open source convention. Maybe we'll get some hot tips about improving the linux/apache/mysql/php performance around here. Tim O'Reilly himself helped hook us up with free passes (he's been nice to us over the years).

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 910837 · Report as offensive
Infomage

Send message
Joined: 28 Dec 00
Posts: 1
Credit: 1,216,140
RAC: 0
United Kingdom
Message 910838 - Posted: 24 Jun 2009, 19:59:56 UTC - in response to Message 910837.  

Glad to know that things will be back up and running soon. :)
ID: 910838 · Report as offensive
aplayer

Send message
Joined: 26 Apr 00
Posts: 13
Credit: 15,217,341
RAC: 0
United States
Message 910994 - Posted: 25 Jun 2009, 0:53:14 UTC - in response to Message 910837.  

thank you.... things may start improving soon.l good to hear....
ID: 910994 · Report as offensive
David

Send message
Joined: 2 Jun 08
Posts: 3
Credit: 268,609
RAC: 0
United States
Message 911055 - Posted: 25 Jun 2009, 3:02:42 UTC

Ok, the down times are getting old. I prefered to have only one project running and it appears this is not the one to to maximize my computers when they are not in use. Guess it is time to find a different and more productive project, it has been a fun run.
ID: 911055 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 911098 - Posted: 25 Jun 2009, 6:26:41 UTC - in response to Message 911055.  

Ok, the down times are getting old. I prefered to have only one project running and it appears this is not the one to to maximize my computers when they are not in use. Guess it is time to find a different and more productive project, it has been a fun run.

No SINGLE project can do that. I've got 10 attached and all of them have had extended down times. It is the nature of the beast.

ID: 911098 · Report as offensive
Profile Jack Zhang
Volunteer tester
Avatar

Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,142,449
RAC: 0
Canada
Message 911123 - Posted: 25 Jun 2009, 7:56:34 UTC

Downloads have been on the frisk for quite a while, even after the re-introduction of 2 download servers. It's symptoms are similar to the problem when uploads are maxed out.
What if Fiction was Fact and Fact was Fiction and vice versa?
ID: 911123 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 911133 - Posted: 25 Jun 2009, 8:57:25 UTC - in response to Message 911123.  

Downloads have been on the frisk for quite a while, even after the re-introduction of 2 download servers. It's symptoms are similar to the problem when uploads are maxed out.


Have you tried stopping BOINC and flushing your DNS cache? I seemed to have
a stale address in several of my machines this morning and enormous download
backlogs; flushing the cache cured the problem.
ID: 911133 · Report as offensive
Administrator

Send message
Joined: 2 Oct 06
Posts: 2
Credit: 203,504
RAC: 0
Australia
Message 911169 - Posted: 25 Jun 2009, 11:41:02 UTC

Is this downtime a common theme with Seti@home now?

After been inactive for a few years I decided to join back up after I received an email from the Seti team. My client has been running for 48 hours now, half of which it has been down.

Then I come here and see from the threads listed that there has been numerous amounts of downtime due to glitches.

Have I made a mistake by rejoining?
ID: 911169 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 911176 - Posted: 25 Jun 2009, 11:59:20 UTC - in response to Message 911169.  

Is this downtime a common theme with Seti@home now?

No more than before.
Like most things you tend to get it in groups.

Have I made a mistake by rejoining?

Can't see how.
Grant
Darwin NT
ID: 911176 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 911179 - Posted: 25 Jun 2009, 12:07:30 UTC

Before someone says it was all better in Classic times (I'm just waiting for it... and am heading you off. ;-)), remember that they had these down times in those days as well, there was just less communication about it.

All the work out there was then just recycled over and over and over and over - up to 50 times over - if at least you were connected to one of the servers doing all that recycling. And else your Seti program wasn't doing anything either.
ID: 911179 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 911201 - Posted: 25 Jun 2009, 12:54:50 UTC - in response to Message 911192.  

Yes, and many associate SETI@home with the SETI Institute. I had to explain to an Italian newspaper that it is not so, but I saw that Scientific American made the same mistake in an article promoting Docking@home.
Tullio
ID: 911201 · Report as offensive
Profile David @ TPS

Send message
Joined: 30 Sep 04
Posts: 70
Credit: 11,323,275
RAC: 0
United States
Message 911220 - Posted: 25 Jun 2009, 13:46:30 UTC - in response to Message 911192.  

It's all about timing. Rejoining in the middle of an outage would be unnerving to say the least. Give it some time to heal, and all will be well until the next hiccup.

Seti is my primary project, but I have others in reserve if it drops off for a while.

As was said above S*** Happens, AND USUALLY AT THE MOST INAPPROPRIATE TIME. My 10 day cache's have weathered everything I have encountered so far as far as outages, and letting other projects run helps too. Sure my RAC has dropped a few thousand, but it will be back! (lost a quad as well!)

(no, I am not the poster above with the same username)

Give Matt and the guys some time to sort it all out.

ID: 911220 · Report as offensive
Administrator

Send message
Joined: 2 Oct 06
Posts: 2
Credit: 203,504
RAC: 0
Australia
Message 911234 - Posted: 25 Jun 2009, 14:16:07 UTC - in response to Message 911192.  

BMgoau: Um yes I did donate, the day I rejoined actually, 48 hours ago. The sum I donated is none of your concern.

Secondly, only returning 48 hours ago, it's a bit hard to go through all the technical news posts.

Thirdly, you are right, I have no idea of the "perfect storm" right now. Would you care to elaborate? No I thought so, it would take too long one would assume.

I acknowledge Seti@home is run on grants and donations, that's why I donated when I rejoined, like I used to periodically when I used to run it years ago before this BOINC!

I'm not putting down, or having a go at the team at Seti, they do a terrific job. I was only asking or really inquiring about these downtimes, as I don't recall there being as many.

Now please step off of your soapbox.
ID: 911234 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 911245 - Posted: 25 Jun 2009, 14:55:51 UTC

Perhaps the question to ask is why the glitch rate remains (apparently) high after all this time (years). It appears to those of us who know nothing that things are tweaked frequently and almost as frequently the first tweak begets a second tweak or an untweak, and so on. Yet after I don't know how many years the data we've processed remains un-analyzed. It is very frustrating.

My remedy has been to connect only in the wee hours of each Berkeley evening and keep a 10 day cache, so as to minimize the server chaos impacting my hosts' productivity. I also try to down shift my attitude before reading these boards so that I don't emotionally red line. Afterall, it is a 'hobby'. And, I think I'm going to turn off some of my old beasts for a while or forever; unlike the original premise of s@h, they are probably just adding more thermodynamic entropy to the universe than they can justify with seti "science"
ID: 911245 · Report as offensive
Profile lostrego

Send message
Joined: 18 Mar 03
Posts: 1
Credit: 4,450,432
RAC: 12
Spain
Message 911249 - Posted: 25 Jun 2009, 15:21:54 UTC

Just a simple hint:

Restarting the BOINC client seems to temporally solve the problem, at least for me.

I'm crunching now again with my old piece o junk ;-)

ID: 911249 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 911256 - Posted: 25 Jun 2009, 15:43:37 UTC - in response to Message 911249.  

Just a simple hint:

Restarting the BOINC client seems to temporally solve the problem, at least for me.

I'm crunching now again with my old piece o junk ;-)


Me too. All my stuck transfers cleared immediately. Thanks!
ID: 911256 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 911258 - Posted: 25 Jun 2009, 15:51:55 UTC - in response to Message 911256.  

Restarted Boinc last night and all my downloads started then.

Claggy
ID: 911258 · Report as offensive
PeterD

Send message
Joined: 2 Jan 01
Posts: 3
Credit: 18,283
RAC: 0
Canada
Message 911260 - Posted: 25 Jun 2009, 15:56:20 UTC

Cleared my browser cache and restarted BOINC and everything is downloading properly now.

Thx for the info
ID: 911260 · Report as offensive
Profile Stephen Motuel

Send message
Joined: 18 May 99
Posts: 1
Credit: 632,206
RAC: 0
Germany
Message 911316 - Posted: 25 Jun 2009, 17:41:05 UTC

Well I'll be a monkys uncle! I 've been waiting for Bionic to download 3 new jobs for the past 3 days with some silly messages like wrong size, server down and so on... now I know! just Exit Bionic restart Bionic and HELLO!!!! everything is alright! there seems to be nothing really wrong with seti (breath easier guys) it seem the problem lays with Bionic! I may be wrong here but what the hell...
ID: 911316 · Report as offensive
Bounce

Send message
Joined: 3 Apr 99
Posts: 66
Credit: 5,604,569
RAC: 0
United States
Message 911325 - Posted: 25 Jun 2009, 18:14:08 UTC - in response to Message 910837.  

>O'Reilly open source convention

WOW! Seti/OpenSource are sponsoring an O'Reily race car? Kewl! Does it run really, really fast but require a complete engine replacement every lap?
ID: 911325 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Retreat! (Jun 24 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.