Long Outage (Jun 23 2009)


log in

Advanced search

Message boards : Technical News : Long Outage (Jun 23 2009)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 910566 - Posted: 23 Jun 2009, 23:09:29 UTC

Usual outage today (which happens every Tuesday for mysql database compression/backup). It went really long - I guess we've been busy inserting/deleting all last week. We went back to an older policy of doing simultaneous compression on both the master and replica, which should vastly speed up post-outage recovery. Until today we've been letting the compression commands (i.e. "alter table user type = innodb") to pass from the master to replica via the usual channels, but they wouldn't happen in parallel (as the loooong queries had to complete successfully on the master before the replica would start processing them). This caused the replica to be as many as four hours behind when the project started up again in the afternoon. The benefit of doing it that way was less work/management and accidental updates/inserts during the outage wouldn't get lost. Going back to doing it in parallel, we have to stop the replica before we start and reset the master after we're done, thus increasing the chance of these lost queries, but so far we've had 0 such incidents during these weekly outages since we started using mysql years ago.

A weekly planned outage is usually a good time to take care of some offline chores. Today I cleaned up lots of unnecessary mounts in a effort to reduce our automounter maps as much as possible (so we don't have such a tangled web which can be quite painful when one server disappears). I also made vader the sole download server, thus freeing bane to be whatever we want - which will be useful to handle certain services temporarily as we go around upgrading the out-of-date operating systems on lots of these machines. I think vader can handle the load alone.

I hear the presentations from the 10th anniversary celebration have all been converted to mpegs. It's a few gigs worth of stuff on a computer down on campus. A flash drive containing all that will appear up here at our lab sometime in the near future. Or it may be hosted on an interim server. We shall see.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12814
Credit: 7,397,478
RAC: 18,308
United States
Message 910571 - Posted: 23 Jun 2009, 23:16:46 UTC - in response to Message 910566.

Thanks for the update.

Hope vader is up to it.

Hey, wow, tag buttons above!

____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3599
Credit: 20,913,455
RAC: 24,309
Sweden
Message 910572 - Posted: 23 Jun 2009, 23:16:51 UTC - in response to Message 910566.

Thanks for the update, always interesting to read.

Can you tell me why the scheduler process, and db_purge.x86_64 are disabled?

The db_purge.x86_64 was disabled long before the outage.


Sten-Arne

Profile Dr. C.E.T.I.
Avatar
Send message
Joined: 29 Feb 00
Posts: 15993
Credit: 690,597
RAC: 0
United States
Message 910589 - Posted: 23 Jun 2009, 23:34:49 UTC

. . . Thanks for the Updates Matt [and the Forum BBCode as well]

> re-posted your comment re: the Videos - in the SETI Cafe - ustream.tv SETI Anniversary webcast

Thanks for that Info as well


____________
BOINC Wiki . . .

Science Status Page . . .

Profile Jack Zhang
Volunteer tester
Avatar
Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,118,509
RAC: 834
Canada
Message 910638 - Posted: 24 Jun 2009, 2:55:44 UTC - in response to Message 910572.

The DB purge is now running, but the scheduler process is still not running. Maybe we're waiting for the splitters to pick up the slack before work is sent out.
____________
What if Fiction was Fact and Fact was Fiction and vice versa?

zpm
Volunteer tester
Avatar
Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,602,404
RAC: 162
United States
Message 910653 - Posted: 24 Jun 2009, 4:17:03 UTC - in response to Message 910638.

glad to see that the backend of boinc webpage was update too....
____________

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.

mtlmrgn
Send message
Joined: 15 Mar 06
Posts: 1
Credit: 190,346
RAC: 0
United States
Message 910822 - Posted: 24 Jun 2009, 19:31:16 UTC - in response to Message 910566.

june 24th not getting any downloads today as you can see it
is not my connection.
____________

Gorim1
Send message
Joined: 15 Nov 06
Posts: 4
Credit: 1,474,936
RAC: 0
Poland
Message 910836 - Posted: 24 Jun 2009, 19:56:38 UTC

Bruce i have the same problem...
Im running out of work to do
Do with it something guys ....

Profile Andy Lee Robinson
Avatar
Send message
Joined: 8 Dec 05
Posts: 615
Credit: 42,873,573
RAC: 26,982
Hungary
Message 910865 - Posted: 24 Jun 2009, 20:49:04 UTC - in response to Message 910566.
Last modified: 24 Jun 2009, 20:50:47 UTC

Matt, just stop slave on the replica and remove alter privileges for the replication account temporarily.
Do the alter command on the slave locally as root, then start slave when finished and it'll carry on from the last position, and don't have to wait for the master to finish.
The alter command from the master will be ignored on the slave (or can be made to), or if it causes replication to stop, then "set global sql_slave_skip_counter=1; start slave;" to skip over it and continue.
Once the slave has read past the alter command, just reset privs again for the repl account on the slave (before any other needed alter commands come along!)
This way, you shouldn't lose any queries at all, nor have to make notes of master pointers.
Entire thing could be done as a script "tuesday_backup.sh" on a cron task!

Profile i_mcintosh
Send message
Joined: 30 May 99
Posts: 3
Credit: 1,038,282
RAC: 0
United Kingdom
Message 911159 - Posted: 25 Jun 2009, 10:58:29 UTC - in response to Message 910822.

Mid-day (25th) here in the UK, still no downloads here either.

I'm out of work :(

Maybe the ETs are blocking it????
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 359,640
RAC: 35
Germany
Message 911162 - Posted: 25 Jun 2009, 11:04:39 UTC - in response to Message 911159.

Maybe the ETs are blocking it????

Or maybe your ISP or DNS server? :-)

Profile Virtual Boss*
Volunteer tester
Avatar
Send message
Joined: 4 May 08
Posts: 417
Credit: 6,200,651
RAC: 358
Australia
Message 911166 - Posted: 25 Jun 2009, 11:28:44 UTC - in response to Message 911159.

Mid-day (25th) here in the UK, still no downloads here either.

I'm out of work :(

Maybe the ETs are blocking it????



If you have task assigned, but are not downloading, try stopping/restarting boinc.
Worked for my rigs.
____________
Flying high with Team Sicituradastra.

Profile i_mcintosh
Send message
Joined: 30 May 99
Posts: 3
Credit: 1,038,282
RAC: 0
United Kingdom
Message 911174 - Posted: 25 Jun 2009, 11:51:37 UTC - in response to Message 911166.

Tried restarting the PCs entirely...

25/06/2009 12:50:01 Internet access OK - project servers may be temporarily down.
25/06/2009 12:50:22 Project communication failed: attempting access to reference site
25/06/2009 12:50:22 SETI@home Temporarily failed download of ap_graphics_5.05_windows_intelx86.exe: connect() failed
25/06/2009 12:50:22 SETI@home Backing off 1 min 0 sec on download of ap_graphics_5.05_windows_intelx86.exe
25/06/2009 12:50:22 SETI@home [error] File ap405.jpg has wrong size: expected 7653, got 0
25/06/2009 12:50:22 SETI@home Started download of ap405.jpg
25/06/2009 12:50:23 Internet access OK - project servers may be temporarily down.

____________

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 562
United States
Message 911185 - Posted: 25 Jun 2009, 12:20:21 UTC - in response to Message 911174.
Last modified: 25 Jun 2009, 12:21:25 UTC

Tried restarting the PCs entirely...

25/06/2009 12:50:01 Internet access OK - project servers may be temporarily down.
25/06/2009 12:50:22 Project communication failed: attempting access to reference site
25/06/2009 12:50:22 SETI@home Temporarily failed download of ap_graphics_5.05_windows_intelx86.exe: connect() failed
25/06/2009 12:50:22 SETI@home Backing off 1 min 0 sec on download of ap_graphics_5.05_windows_intelx86.exe
25/06/2009 12:50:22 SETI@home [error] File ap405.jpg has wrong size: expected 7653, got 0
25/06/2009 12:50:22 SETI@home Started download of ap405.jpg
25/06/2009 12:50:23 Internet access OK - project servers may be temporarily down.


On your settings..........under Computing Preferences the last setting states....

Skip image file verification?
Check this ONLY if your Internet provider modifies image files (UMTS does this, for example).
Skipping verification reduces the security of BOINC.


You should set this to yes. You need to skip verification to clear the graphics faults you have.

A restart of the computer should clear the rest.
____________
Boinc....Boinc....Boinc....Boinc....

Profile i_mcintosh
Send message
Joined: 30 May 99
Posts: 3
Credit: 1,038,282
RAC: 0
United Kingdom
Message 911226 - Posted: 25 Jun 2009, 13:59:53 UTC - in response to Message 911185.

Brilliant! Thanks tons :)




On your settings..........under Computing Preferences the last setting states....

Skip image file verification?
Check this ONLY if your Internet provider modifies image files (UMTS does this, for example).
Skipping verification reduces the security of BOINC.


You should set this to yes. You need to skip verification to clear the graphics faults you have.

A restart of the computer should clear the rest.


____________

C
Send message
Joined: 3 Apr 99
Posts: 240
Credit: 6,695,016
RAC: 985
United States
Message 911235 - Posted: 25 Jun 2009, 14:17:05 UTC - in response to Message 911166.

Mid-day (25th) here in the UK, still no downloads here either.

I'm out of work :(

Maybe the ETs are blocking it????



If you have task assigned, but are not downloading, try stopping/restarting boinc.
Worked for my rigs.


Thanks, Boss - that worked for my machines.

C
____________

Join Team MacNN

Message boards : Technical News : Long Outage (Jun 23 2009)

Copyright © 2014 University of California