Message boards :
Technical News :
Trying Tuesday (Apr 08 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Had a relatively painless weekend, which is a good sign as that probably means we correctly determined the cause of our workunit download server woes (broken faceplate sending bogus resets to the system). Everything else was okay except the database statistics on the server status page flatlined. This was fallout from the mysql database server rebooting itself on Thursday and the replica server getting out of sync. Since this was a harmless, cosmetic problem we let this fire burn until we re-synced the two databases today during the (extra long) weekly outage. Why were we down today for so long? What happened?! Seems like last week's database crash caused some minor confusion in (at least) the "credited_job" table, which of course is the largest table in the database. So we had to run a long, expensive "repair table" query after a longer, more expensive "optimize table" query failed with error thus preventing us from even backing up the database. How annoying. Even more annoying: the /tmp partition filled up during the repair so mysql twiddled its thumbs for 20 minutes before we realized and cleared out more space. Then /tmp filled up again. Then we realized the it was trying to write about 10GB of data to /tmp. This wasn't gonna happen. So we killed the "repair table" query and simply restarted the project so people could get back to work. However, without credited_job the validators can't work, so they're offline for the night. We'll discuss tomorrow what to do next. We still haven't backed up or re-synced our databases. They might be an extra outage tomorrow. We employed the new workunit-generating splitters with radar blanking yesterday, but then overnight ran out of work to send out. This was due to the way our data was collected and stored in the raw data files. Long story short, data buffers are collected and stored in pairs, one which contains the radar blanking signal (which lets us know exactly when the noisy radar is on), the other of which does not and therefore gets its blanking signal from its sibling. However, the orientation of these pairs in the data isn't fixed and may reverse "polarity" at any time. So there's a good chance the first buffer in a data file is missing its sibling and therefore can't find any blanking information. This is a critical error, so splitters were getting hung up on these files as the queue slowly drained. Not a big deal, and Jeff reworked the logic in the splitter so these errors are not critical (we'll just skip the first buffer). Anyway, this only affects a couple months' worth of files - we already fixed the logic on the data recorder down at Arecibo to reduce the chance of "half pairs" happening in a single file. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Nick Fox Send message Joined: 5 Jan 04 Posts: 46 Credit: 2,834,922 RAC: 0 |
Thanks for the update Matt... Just goes to show that nomatter how much storage you have, you always need more! |
AndyW Send message Joined: 23 Oct 02 Posts: 5862 Credit: 10,957,677 RAC: 18 |
|
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
...They might be an extra outage tomorrow. ... Don't forget to mention this extra outage on the frontpage anticipating irritations, Matt. _\|/_ U r s |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
I need to download at least 10 days worth of WUs, now there are none to download, to keep me going through the coming planned/unplanned Outrage Wednesday It's good to be back amongst friends and colleagues |
JimHilty2 Send message Joined: 30 Apr 03 Posts: 75 Credit: 7,199,464 RAC: 0 |
I don't see any problem with downloads. Just no validation. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . as mentioned below (by Jim) - receiving & returning all fine - only NO Validation - ces't la vie . . . Thanks for the Updates Matt - Hope the Day goes Well for Berkeley Today . . . BOINC Wiki . . . Science Status Page . . . |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Well, that answered my question. I was wondering why I had some stuck waiting for validation. Keep up the good work guys. PROUD MEMBER OF Team Starfire World BOINC |
Mr. Majestic Send message Joined: 26 Nov 07 Posts: 4752 Credit: 258,845 RAC: 0 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.