Quick Outage Today (Sep 22 2009)

Message boards : Technical News : Quick Outage Today (Sep 22 2009)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 935271 - Posted: 22 Sep 2009, 20:43:14 UTC

Today was an outage day, with nothing special to report on that front. One interesting note is that our master mysql database server (mork) has 24 processors and 64 GB of memory, and the replica server (jocelyn, which used to be the master) has 4 processors and 28 GB of memory. Eric recently cleaned out really old rows from the beta result table - now the entire database fits better in memory on jocelyn, and in turn this database engine generally performs better than mork. How could this be? Because despite have far less memory and processors, jocelyn has more disk spindles (and faster disks, for that matter) than mork. Not really all that surprising, but it's fun to see our suspicions about disk performance confirmed with memory being less of a bottleneck. In any case, both servers are zippy and today's outage wasn't very long, was it?

So the weekend went by with nary a blip, or even a single alert from my web of alert scripts. This pretty much never happens. We always get kind of warning, severe or otherwise - high load on this server, replica database is falling behind, rising temperatures in the closet... but nope. Everything was just fine.

However yesterday we did have one short traffic dip due to the science database getting locked up on too many internal user queries, so the splitters weren't creating work for a couple hours there. No biggie - we killed the queries and informix sprung back to life. It is a bit worrisome how locked up the database can get, though, and it's hardly predictable when (or why) it does.

I'm actually running my software radar blanker through an entire 50GB test file right now. It processes in roughly twice real time (meaning a file containing n hours of data takes 2n hours to find radar and blank it). Not to worry - we can run many of these in parallel. I could also make several code optimizations if need be. Anyway, I'm hoping by the end of the week to trust this suite of software enough to start processing our large backlog of 2007-2008 data by next month.

Oh yeah one more thing - we do know that "queries/second" field is blank on the server status page. For some reason the same exact informational query on one server returns in a different format
than the other, so our general "db stats" script is sorta broken. Bob is fixing it.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 935271 · Report as offensive
Profile Francis Noel
Avatar

Send message
Joined: 30 Aug 05
Posts: 452
Credit: 142,832,523
RAC: 94
Canada
Message 935277 - Posted: 22 Sep 2009, 21:16:22 UTC

Thanks for the update Matt.

Did you get that eerie feeling that things just went "too well" ? As a sysadmin when everything is going smoothly I always get that calm-before-the-storm feeling of impending doom :).
mambo
ID: 935277 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 935301 - Posted: 22 Sep 2009, 23:50:50 UTC - in response to Message 935277.  

wouldn't it be nice to have some SSdrives.

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.
ID: 935301 · Report as offensive
ront

Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 935368 - Posted: 23 Sep 2009, 8:21:11 UTC - in response to Message 935271.  

thank you for the info. Have 14 "pendings" stacked up dating back to the 17th

Please advise.

Be Blessed & Be A Blessing,


ront
ID: 935368 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 935382 - Posted: 23 Sep 2009, 11:11:39 UTC - in response to Message 935301.  
Last modified: 23 Sep 2009, 11:12:25 UTC

wouldn't it be nice to have some SSdrives.


Actually they do. Apparently they won't work with the new Intel server (Mork).
BOINC blog
ID: 935382 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 935409 - Posted: 23 Sep 2009, 15:59:37 UTC - in response to Message 935368.  

thank you for the info. Have 14 "pendings" stacked up dating back to the 17th

Please advise.

These questions belong in Number Crunching. Technical News is for general updates from the project.

Every work unit has to be processed twice, and the results must match. Your pendings are waiting for the second result. If you need more, ask in Number Crunching.

ID: 935409 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 935442 - Posted: 23 Sep 2009, 18:21:16 UTC - in response to Message 935409.  

thank you for the info. Have 14 "pendings" stacked up dating back to the 17th

Please advise.

These questions belong in Number Crunching. Technical News is for general updates from the project.

Every work unit has to be processed twice, and the results must match. Your pendings are waiting for the second result. If you need more, ask in Number Crunching.


To find out which of the above is true (I.E. waiting for matching WU or that that WU didn't jibe with yours...) on the accounts page, click on "Tasks": Items that say "Completed, waiting for validation" are waiting for the matching WU, items that say "Completed, validation inconclusive" are waiting of a "tie breaker" WU to determine which of two different results is correct.
.

Hello, from Albany, CA!...
ID: 935442 · Report as offensive

Message boards : Technical News : Quick Outage Today (Sep 22 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.