Storm (Oct 14 2009)


log in

Advanced search

Message boards : Technical News : Storm (Oct 14 2009)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 939902 - Posted: 14 Oct 2009, 17:51:33 UTC

Got finished kinda late yesterday hence the lack of tech news report. So last week we had lots of database cleanup to deal with due to server crashes over the preceding weekend. The mysql replica database suffered quite a bit, so we planned to recover it using a standard mysql dump file, except we discovered that the latest version of mysql is buggy and the dump files sometimes contain syntax errors. Great.

So this week we recovered the replica by copying all the myisam and innodb data files (and logs) from mork (the master database server). We actually did the first rsync on Monday to help speed things along on Tuesday, but there are so many large files it took forever to even to just do the "delta" rsync. That's why the outage was so long (this final rsync could only happen when the master database was quiescent).

This morning Bob and I made sure all the right config tweaks were made in /etc/my.cnf and started up the replica server. Only one minor snag at first which we fixed, now it's running again and catching up! We still have to figure out how to get mysqldump to work 100% of the time syntax-error-free. That's actually kind of scary.

Meanwhile, the Bay Area was hit with a record breaking storm yesterday. Yeah, I grew up in New York so I can say it was only really a "storm" by Bay Area standards but still we had low temperatures and high humidity. This wreaks havoc on our air conditioner, and the server closet has been hovering a few degrees away from disaster for a couple days now. In fact, ewen (Eric's hydrogen survey server) just lost a drive in the root RAID. It shouldn't be a big deal to replace, except that when ewen goes down for maintenance everything hangs (as there are lots of informix libs living on that system - we really need to move them off but have been loathe to do so for fear of breaking something else).

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Neil Blaikie
Volunteer tester
Avatar
Send message
Joined: 17 May 99
Posts: 142
Credit: 6,597,821
RAC: 8,186
Canada
Message 939918 - Posted: 14 Oct 2009, 19:30:07 UTC - in response to Message 939902.

Dump some of the weather towards Montreal, Canada please. It is cold, gray miserable and actually snowed for a few minutes earlier today. Warmish and wet is better than gray and cold!

Hope everything gets back to "normal" soon and you figure out the mysqldump issue, sounds like it is going to be a main in the proverbial ass though.

As always thank you for taking your precious time to keep us informed and also thank you to Bob as well for your efforts.

Keep up the good work
____________

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,564
RAC: 596
United Kingdom
Message 939940 - Posted: 14 Oct 2009, 20:39:11 UTC

This is old news for some but i just discovered it.
Over 300 known bugs, some serious. the url says it all,
I did`nt read through all of it, its a bit long and above my level.
Plenty of things to `tidy up` :)

http://monty-says.blogspot.com/2008/11/oops-we-did-it-again-mysql-51-released.html

P. J. Crabtree
Send message
Joined: 17 Jan 07
Posts: 19
Credit: 1,238,457
RAC: 2,284
United States
Message 939967 - Posted: 14 Oct 2009, 21:59:43 UTC

So, it was pretty much a typical outage day.
____________

DJStarfox
Send message
Joined: 23 May 01
Posts: 1045
Credit: 560,168
RAC: 442
United States
Message 939995 - Posted: 14 Oct 2009, 23:41:01 UTC - in response to Message 939902.

So, the replica DB is still disabled until it can catch up to the master? You got a pair of fiber cards between those two servers yet?

A reliable, online backup mechanism (binary) for a production database should be mandatory. Maybe MySQL 6.0? Or a commercial product?
http://www.arkeia.com/en/products/arkeia-network-backup/backup-agent/database-agent/mysql-agent

Profile 52 AcesProject donor
Avatar
Send message
Joined: 7 Jan 02
Posts: 497
Credit: 13,336,288
RAC: 3,564
United States
Message 940009 - Posted: 15 Oct 2009, 1:15:22 UTC - in response to Message 939902.

Will you set Boinc "show_results" back to TRUE any time soon?

Thx so much !

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24691
Credit: 522,659
RAC: 19
United States
Message 940016 - Posted: 15 Oct 2009, 2:02:27 UTC - in response to Message 939995.

So, the replica DB is still disabled until it can catch up to the master? You got a pair of fiber cards between those two servers yet?

A reliable, online backup mechanism (binary) for a production database should be mandatory. Maybe MySQL 6.0? Or a commercial product?
http://www.arkeia.com/en/products/arkeia-network-backup/backup-agent/database-agent/mysql-agent

Online backup systems cost money - SETI has none of that.
____________


BOINC WIKI

Acct Closed
Send message
Joined: 14 Oct 09
Posts: 3
Credit: 14,617
RAC: 0
Message 940022 - Posted: 15 Oct 2009, 2:22:09 UTC - in response to Message 940009.

Will you set Boinc "show_results" back to TRUE any time soon?

Thx so much !


With this feature turned off, we have a lot more internet bandwidth for the Seti@Home project to do uploads, downloads, scheduler requests, etc. I'm wondering why we need this feature turned back on since Boinc was designed to be simply set and left alone to run. My long-time Seti friends have not had any network dropouts since this feature was turned off.

What is the value of being able to parse through one's or someone else's tasks?

Profile Zeus Fab3r
Avatar
Send message
Joined: 17 Jan 01
Posts: 645
Credit: 99,695,750
RAC: 123,825
Serbia
Message 940025 - Posted: 15 Oct 2009, 2:42:15 UTC - in response to Message 940022.

What is the value of being able to parse through one's or someone else's tasks?


...high-end MoBo...300$, latest CPU ...974$,
mindblowing watercooled VGA ...700$ ...
Chance to see how, or if it works as it should, compared to others...priceless :)
____________

Who the hell is General Failure and why is he reading my harddisk?¿

Profile 52 AcesProject donor
Avatar
Send message
Joined: 7 Jan 02
Posts: 497
Credit: 13,336,288
RAC: 3,564
United States
Message 940042 - Posted: 15 Oct 2009, 3:20:46 UTC - in response to Message 940022.

With this feature turned off, we have a lot more internet bandwidth for the Seti@Home project to do uploads, downloads, scheduler requests, etc.


A lot? Not quite ;-) Results traffic (wire bandwidth consumed) is pretty minor --- you'd probably have to look at 250 pages worth of results (in under 5 seconds) to equal 1 potential WU download in bandwidth (I say 'potential', because bandwidth isn't banked -- if it goes unused this second it's gone forever). When people can't download a WU, it's because there are none to begin with or the system is swamped by things OTHER than Result serving (ditto uploads, scheduling, etc).

In fact, for those of us reading here, the Forums with those signature images & avatars on practically every message of every thread are a greater overall bandwidth hit. But aesthetics & e-awards have their values too.

Had any of these been genuine culprits in the past, it's pretty easy to throttle both with a QOS pushback for the sake of production traffic (vs status & other website traffic such as the streaming videos), it might even already be there, just never seen because it's just not an issue since on avgerage the system handles and balances it all (aside from the self imposed post maintenance storm every Tuesday). But yes, a bigger newer pipe would be grand for far more than speedier results ;-)

Cheers

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 940043 - Posted: 15 Oct 2009, 3:30:05 UTC - in response to Message 940022.
Last modified: 15 Oct 2009, 3:32:26 UTC

Will you set Boinc "show_results" back to TRUE any time soon?

Thx so much !


With this feature turned off, we have a lot more internet bandwidth for the Seti@Home project to do uploads, downloads, scheduler requests, etc. I'm wondering why we need this feature turned back on since Boinc was designed to be simply set and left alone to run. My long-time Seti friends have not had any network dropouts since this feature was turned off.

What is the value of being able to parse through one's or someone else's tasks?


Nikos

What people see when they visit the Forums or look at their Account page is on a seperate link. It does not interfere with Uploads/Downloads to Seti. The Pending Credit or Tasks information is normally drawn from the Replica Boinc Database.

What is in their Account page, attacks the Boinc Replica Database. Drops outs would "normally" be associated with problems accessing the Boinc Master Database. Most of that has been cleared.

So for the answer to what is the value of being able to look at problems with a Workunit that someone had a problem with? It is important in Users Helping Users. The ability to see what happened or did not happen. This also goes for the Lunatics Crew to see what was broken. Or the Access to the Information.

A Large number of people try to Help. When they do not have access to the information it is very tough.

Regards
____________
Please consider a Donation to the Seti Project.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12326
Credit: 2,631,128
RAC: 1,098
Netherlands
Message 940093 - Posted: 15 Oct 2009, 8:03:39 UTC - in response to Message 940043.

What is in their Account page, attacks the Boinc Replica Database.



Poor server. :-)

____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile 52 AcesProject donor
Avatar
Send message
Joined: 7 Jan 02
Posts: 497
Credit: 13,336,288
RAC: 3,564
United States
Message 940094 - Posted: 15 Oct 2009, 8:09:01 UTC - in response to Message 940093.

Poor server. :-)


At last, an Apache Server that attacks back !

thentangler
Send message
Joined: 9 Oct 09
Posts: 1
Credit: 21,091
RAC: 0
United States
Message 940110 - Posted: 15 Oct 2009, 11:12:40 UTC - in response to Message 940094.

I cant seem to get any Graphical results loaded when Seti@home is running.
In screensaver mode, it says that Boinc is currently idle.
Is there a problem at the server?
____________

HAL
Send message
Joined: 28 Mar 03
Posts: 704
Credit: 870,617
RAC: 0
United States
Message 940113 - Posted: 15 Oct 2009, 11:56:40 UTC - in response to Message 940110.
Last modified: 15 Oct 2009, 11:59:31 UTC

Nobody has been able to display results, tasks, or pending for the past 2 weeks almost. It is completely unknown if or when they will be able to do so again.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12326
Credit: 2,631,128
RAC: 1,098
Netherlands
Message 940118 - Posted: 15 Oct 2009, 12:36:08 UTC - in response to Message 939902.

You know, I just read your old post on why Classic closed again and saw this in there:

Matt L. wrote:
4. The SETI@home Classic backend is a tangled mess. There have been many problems over the years, most of which were invisible to the participants. None of these problems were fatal to the project or its science, but have resulted in an obnoxious web of ridiculous dependencies, confusing configurations, and unweildy databases. I am practically drooling dreaming of day when we get to turn all that stuff off and be done with it already. The BOINC backend is sooooo much easier to deal with.

And here we are... That still true then? ;-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Pooh Bear 27
Volunteer tester
Avatar
Send message
Joined: 14 Jul 03
Posts: 3221
Credit: 2,640,394
RAC: 1,530
United States
Message 940120 - Posted: 15 Oct 2009, 12:39:54 UTC - in response to Message 940042.

In fact, for those of us reading here, the Forums with those signature images & avatars on practically every message of every thread are a greater overall bandwidth hit. But aesthetics & e-awards have their values too.

This does not take up bandwidth as you would think. The web pages uses the Campus network. The file transfers take the SETI bandwidth. So you are not stealing bandwidth from the SETI pipe when perusing the webpages.


____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 31,037,899
RAC: 20,741
United States
Message 940136 - Posted: 15 Oct 2009, 13:50:19 UTC - in response to Message 940025.

What is the value of being able to parse through one's or someone else's tasks?


...high-end MoBo...300$, latest CPU ...974$,
mindblowing watercooled VGA ...700$ ...
Chance to see how, or if it works as it should, compared to others...priceless :)


Reminds me of a quote I recently heard in the movie "I Hope They Serve Beer In Hell", and I've heard the line before then as well:

If it lacks a price, then it it probably worthless. :)

PS - This was only meant as humor. Anyone lacking a sense of humor should please disengage themselves from the keyboard before firing off a flame or argument to the contrary. Thank you and have a nice day. :)
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 31,037,899
RAC: 20,741
United States
Message 940137 - Posted: 15 Oct 2009, 13:51:18 UTC - in response to Message 940118.

You know, I just read your old post on why Classic closed again and saw this in there:

Matt L. wrote:
4. The SETI@home Classic backend is a tangled mess. There have been many problems over the years, most of which were invisible to the participants. None of these problems were fatal to the project or its science, but have resulted in an obnoxious web of ridiculous dependencies, confusing configurations, and unweildy databases. I am practically drooling dreaming of day when we get to turn all that stuff off and be done with it already. The BOINC backend is sooooo much easier to deal with.

And here we are... That still true then? ;-)


LOL
____________

Andreas
Send message
Joined: 21 Jan 02
Posts: 16
Credit: 9,911,789
RAC: 0
Germany
Message 940138 - Posted: 15 Oct 2009, 13:53:36 UTC - in response to Message 940118.

You know, I just read your old post on why Classic closed again and saw this in there:

Matt L. wrote:
4. The SETI@home Classic backend is a tangled mess. There have been many problems over the years, most of which were invisible to the participants. None of these problems were fatal to the project or its science, but have resulted in an obnoxious web of ridiculous dependencies, confusing configurations, and unweildy databases. I am practically drooling dreaming of day when we get to turn all that stuff off and be done with it already. The BOINC backend is sooooo much easier to deal with.

And here we are... That still true then? ;-)


ROFL
____________

1 · 2 · Next

Message boards : Technical News : Storm (Oct 14 2009)

Copyright © 2014 University of California