Here's another end-of-the-month update. First, here's some closure/news regarding various items I mentioned in my last post a month ago.
Regarding the replica mysql database (jocelyn) - this is an ongoing problem, but it is not a show stopper, nor does it hamper any of our progress/processing in the slightest. It's really solely an up-to-the-minute backup of our master mysql database (running on carolyn) in case major problems arise. We still back up the database every week, but it's nice to have something current because we're updating/inserting/deleting millions of rows per day. Anyway, I did finally get that fibrechannel card working with the new OS (yay) and Bob got mysql 5.5 working on it (yay) but the system's issues with attached storage devices remain, despite swapping out entire devices - so this must be the card after all. We'll swap one out (if we have another one) next week. Or think of another solution. Or do nothing because this isn't the highest priority.
Speaking of the carolyn server, last week it locked up exactly the same way the upload server (bruno) has, i.e. the kernel freaks out about a locked CPU and all processes grind to a halt. We thought this was perhaps a bad CPU on bruno, but now that this happened on carolyn (an equally busy but totally different kind of system with different CPU models running different kinds of processes) we're thinking this is a linux kernel issue. We'll yum them up next week but I doubt that'll do anything.
We're still in the situation where the science databases are so busy we can't run the splitters/assimilators at the same time as backend science processing. We're constantly swapping the two groups of tasks back and forth. Don't see any near-term solution other than that. Maybe more RAM in oscar (the main science informix server). This also isn't a show-stopper, but definitely slows down progress.
The astropulse database had some major issues there (we got the beta database in a corrupted state such that we couldn't start the whole engine, nor could drop the corrupted database). We got support from IBM/informix who actually logged in, flipped a couple secret bits, and we were back in business.
So... regarding the HE connection woes. This remains a mystery. After starting that thread in number crunching and before I could really dig into it I had a couple random minor health issues (really minor, everything's fine, though I claimed sick days for the first time in years) and a planned vacation out of town, and everybody else was too busy (or also out of town) to pick up the ball. I have to be honest that this wasn't given the highest priority as we're still pushing out over 90Mbits/sec on average and maxing out our pipe - so even if we cleared up these (seemingly few and random) connection/routing issues they'd have no place to go. Really we should be either increasing our bandwidth capacity or putting in measures to not send out so many noisy workunits first.
Still, I dug in and got a hold of Hurricane Electric support. We're kind of finding if there *is indeed* an issue, it's from the hop from their last router to our router down at the PAIX. But our router is fine (it is soon to reach 3 years of solid uptime, in fact). The discussion/debugging with HE continues. Meanwhile I still haven't found a public traceroute test server anywhere on the planet that continues fails to reach us (i.e. a good test case that I have access to). I also wonder if this has to do with the recent IPV6 push around the world in early June.
Progress continues in candidate land. We kind of put on hold the public-involvement portion of candidate hunting due to lack of resources. Plus we're still finding lots of RFI in our top candidates which is statistically detectable but not quite obvious to human eyes. Jeff's spending a lot of time cleaning that up, hopefully to get to a point where (a) we can make tools to do this automatically or (b) it's a less-pervasive, manageable issue.
That enough for now.
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude