Message boards :
Technical News :
Long Outage (Jun 23 2009)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Usual outage today (which happens every Tuesday for mysql database compression/backup). It went really long - I guess we've been busy inserting/deleting all last week. We went back to an older policy of doing simultaneous compression on both the master and replica, which should vastly speed up post-outage recovery. Until today we've been letting the compression commands (i.e. "alter table user type = innodb") to pass from the master to replica via the usual channels, but they wouldn't happen in parallel (as the loooong queries had to complete successfully on the master before the replica would start processing them). This caused the replica to be as many as four hours behind when the project started up again in the afternoon. The benefit of doing it that way was less work/management and accidental updates/inserts during the outage wouldn't get lost. Going back to doing it in parallel, we have to stop the replica before we start and reset the master after we're done, thus increasing the chance of these lost queries, but so far we've had 0 such incidents during these weekly outages since we started using mysql years ago. A weekly planned outage is usually a good time to take care of some offline chores. Today I cleaned up lots of unnecessary mounts in a effort to reduce our automounter maps as much as possible (so we don't have such a tangled web which can be quite painful when one server disappears). I also made vader the sole download server, thus freeing bane to be whatever we want - which will be useful to handle certain services temporarily as we go around upgrading the out-of-date operating systems on lots of these machines. I think vader can handle the load alone. I hear the presentations from the 10th anniversary celebration have all been converted to mpegs. It's a few gigs worth of stuff on a computer down on campus. A flash drive containing all that will appear up here at our lab sometime in the near future. Or it may be hosted on an interim server. We shall see. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30932 Credit: 53,134,872 RAC: 32 |
Thanks for the update. Hope vader is up to it. Hey, wow, tag buttons above! |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Thanks for the Updates Matt [and the Forum BBCode as well] > re-posted your comment re: the Videos - in the SETI Cafe - ustream.tv SETI Anniversary webcast Thanks for that Info as well BOINC Wiki . . . Science Status Page . . . |
Jack Zhang Send message Joined: 2 Jul 06 Posts: 206 Credit: 6,142,449 RAC: 0 |
The DB purge is now running, but the scheduler process is still not running. Maybe we're waiting for the splitters to pick up the slack before work is sent out. What if Fiction was Fact and Fact was Fiction and vice versa? |
zpm Send message Joined: 25 Apr 08 Posts: 284 Credit: 1,659,024 RAC: 0 |
glad to see that the backend of boinc webpage was update too.... I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/ Go Georgia Tech. |
mtlmrgn Send message Joined: 15 Mar 06 Posts: 1 Credit: 190,346 RAC: 0 |
june 24th not getting any downloads today as you can see it is not my connection. |
Gorim1 Send message Joined: 15 Nov 06 Posts: 4 Credit: 1,536,081 RAC: 0 |
Bruce i have the same problem... Im running out of work to do Do with it something guys .... |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Matt, just stop slave on the replica and remove alter privileges for the replication account temporarily. Do the alter command on the slave locally as root, then start slave when finished and it'll carry on from the last position, and don't have to wait for the master to finish. The alter command from the master will be ignored on the slave (or can be made to), or if it causes replication to stop, then "set global sql_slave_skip_counter=1; start slave;" to skip over it and continue. Once the slave has read past the alter command, just reset privs again for the repl account on the slave (before any other needed alter commands come along!) This way, you shouldn't lose any queries at all, nor have to make notes of master pointers. Entire thing could be done as a script "tuesday_backup.sh" on a cron task! |
i_mcintosh Send message Joined: 30 May 99 Posts: 3 Credit: 1,038,479 RAC: 0 |
Mid-day (25th) here in the UK, still no downloads here either. I'm out of work :( Maybe the ETs are blocking it???? |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Maybe the ETs are blocking it???? Or maybe your ISP or DNS server? :-) |
Virtual Boss* Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0 |
Mid-day (25th) here in the UK, still no downloads here either. If you have task assigned, but are not downloading, try stopping/restarting boinc. Worked for my rigs. Flying high with Team Sicituradastra. |
i_mcintosh Send message Joined: 30 May 99 Posts: 3 Credit: 1,038,479 RAC: 0 |
Tried restarting the PCs entirely... 25/06/2009 12:50:01 Internet access OK - project servers may be temporarily down. 25/06/2009 12:50:22 Project communication failed: attempting access to reference site 25/06/2009 12:50:22 SETI@home Temporarily failed download of ap_graphics_5.05_windows_intelx86.exe: connect() failed 25/06/2009 12:50:22 SETI@home Backing off 1 min 0 sec on download of ap_graphics_5.05_windows_intelx86.exe 25/06/2009 12:50:22 SETI@home [error] File ap405.jpg has wrong size: expected 7653, got 0 25/06/2009 12:50:22 SETI@home Started download of ap405.jpg 25/06/2009 12:50:23 Internet access OK - project servers may be temporarily down. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Tried restarting the PCs entirely... On your settings..........under Computing Preferences the last setting states.... Skip image file verification? Check this ONLY if your Internet provider modifies image files (UMTS does this, for example). Skipping verification reduces the security of BOINC. You should set this to yes. You need to skip verification to clear the graphics faults you have. A restart of the computer should clear the rest. Boinc....Boinc....Boinc....Boinc.... |
i_mcintosh Send message Joined: 30 May 99 Posts: 3 Credit: 1,038,479 RAC: 0 |
Brilliant! Thanks tons :)
|
C Send message Joined: 3 Apr 99 Posts: 240 Credit: 7,716,977 RAC: 0 |
Mid-day (25th) here in the UK, still no downloads here either. Thanks, Boss - that worked for my machines. C Join Team MacNN |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.