Outage (Jun 02 2009)

Message boards : Technical News : Outage (Jun 02 2009)

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1441
Credit: 213,689
RAC: 0
United States
Message 902975 - Posted: 2 Jun 2009, 23:29:04 UTC

Had the weekly outage today - the normal database/compression/cleanup stuff was by the book, however we took the time to address some other hardware issues. First and foremost, we replaced the failed drive on thumper. I was griping about this yesterday and how this means we'll have to reboot, which means we're forced to resync the root RAID devices. Well, that's happening now. I also upgraded the kernel on worf. That sort of went well - except upon coming back on line one of the spare drives was marked as failed. We're dealing with that now.

Coming out of these weekly outages has gotten painful given our increased rate of traffic lately, and these web queries that continue to clobber us. I try to aim these at the replica, which helps, but right after outages the replica is effectively offline for many hours as it is still busy recreating the giant tables. So I have to temporarily aim those web queries at the master, which makes recovery even slower. We gotta figure this all out, come up with a better weekly backup/reorg policy, or get that new replica server up and running sooner than later. We did order drives for it - should be here later in the week.

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ID: 902975 · Report as offensive
Profile Johnney Guinness
Volunteer tester
Avatar

Send message
Joined: 11 Sep 06
Posts: 3093
Credit: 2,652,287
RAC: 0
Ireland
Message 902989 - Posted: 2 Jun 2009, 23:41:22 UTC
Last modified: 2 Jun 2009, 23:44:04 UTC

Matt, thank you!

You are the only one who faithfully keeps us up-to-date about whats going on. Without you Matt, we would never know what was happening with our favourite project :) Matt, you give me hope, and isn't that what keeps this project alive, Hope!

John.


ID: 902989 · Report as offensive
Profile Ageless
Avatar

Send message
Joined: 9 Jun 99
Posts: 13810
Credit: 3,269,733
RAC: 0
Netherlands
Message 902992 - Posted: 2 Jun 2009, 23:43:30 UTC - in response to Message 902975.
Last modified: 2 Jun 2009, 23:46:39 UTC

So Commander Worf and Chancellor Gowron are named, but the greatest General 'still alive' (Martok) isn't?

Hu'tegh! (qoH vuvbe' SuS) ;-)

((You do know that Worf killed Gowron, don't you? So perhaps Gowron is having its revenge in your server closet.)

Are those web queries still bots reading stats?
Since the replica database is effectively off line for a couple of hours after the outage, isn't it a good idea then to also disable the stats? Or does that not matter for the bots?


Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!

ID: 902992 · Report as offensive
Rob

Send message
Joined: 4 Sep 07
Posts: 57
Credit: 2,444,762
RAC: 0
Canada
Message 903047 - Posted: 3 Jun 2009, 1:21:16 UTC - in response to Message 902975.

Had the weekly outage today - the normal database/compression/cleanup stuff was by the book, however we took the time to address some other hardware issues. First and foremost, we replaced the failed drive on thumper. I was griping about this yesterday and how this means we'll have to reboot, which means we're forced to resync the root RAID devices. Well, that's happening now. I also upgraded the kernel on worf. That sort of went well - except upon coming back on line one of the spare drives was marked as failed. We're dealing with that now.

Coming out of these weekly outages has gotten painful given our increased rate of traffic lately, and these web queries that continue to clobber us. I try to aim these at the replica, which helps, but right after outages the replica is effectively offline for many hours as it is still busy recreating the giant tables. So I have to temporarily aim those web queries at the master, which makes recovery even slower. We gotta figure this all out, come up with a better weekly backup/reorg policy, or get that new replica server up and running sooner than later. We did order drives for it - should be here later in the week.

- Matt



Thank you sir. Your updates and constant efforts are greatly appreciated

ID: 903047 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 903107 - Posted: 3 Jun 2009, 4:23:48 UTC - in response to Message 903047.

get another server to help with the load is really the only solution.



I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.

ID: 903107 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 900
Credit: 7,846,588
RAC: 1,058
New Zealand
Message 903179 - Posted: 3 Jun 2009, 8:58:27 UTC

Thanks team for all the hard work. One thing to note is the outage notice is still showing on the front page


ID: 903179 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 903204 - Posted: 3 Jun 2009, 11:42:48 UTC - in response to Message 903179.

Thanks team for all the hard work. One thing to note is the outage notice is still showing on the front page

If your talking about the news notice, that a permanent notice.
It you can't view the front page, flush your browser cache.

ID: 903204 · Report as offensive
Profile Jet

Send message
Joined: 25 Sep 07
Posts: 12
Credit: 1,586,013
RAC: 0
Ukraine
Message 903220 - Posted: 3 Jun 2009, 12:24:56 UTC - in response to Message 903204.

Unfortunately, flushing the browser cache doen't help. Weekly outage notice is still on the front page.

ID: 903220 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 903272 - Posted: 3 Jun 2009, 16:35:03 UTC - in response to Message 903107.

get another server to help with the load is really the only solution.

There is always this pesky problem about money.

ID: 903272 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 900
Credit: 7,846,588
RAC: 1,058
New Zealand
Message 903326 - Posted: 3 Jun 2009, 20:42:10 UTC - in response to Message 903204.

Thanks team for all the hard work. One thing to note is the outage notice is still showing on the front page

If your talking about the news notice, that a permanent notice.
It you can't view the front page, flush your browser cache.

Everything is sorted on front page. Thanks

ID: 903326 · Report as offensive

Message boards : Technical News : Outage (Jun 02 2009)


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.