Message boards :
Technical News :
Stardust and Sand (Jun 23 2011)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Here's another catch-up tech news report. No big news, but more of the usual. Last week we got beyond the annoying limits with the Astropulse database. There's still stuff to do "behind the scenes" but we are at least able to insert signals, and thus the assimilators are working again. The upload server (bruno) keeps locking up. This is load related - it happens more often when we are maxed out, and of course we're pretty much maxed out all the time these days. We're thinking this may actually be a bad CPU. We'll swap it out and see if the problem goes away. Until then.. we randomly lose the ability to upload workunits and human intervention (to power cycle the machine locally or remotely) is required. We've been moving back-end processes around. I mentioned before how we moved the assimilators to synergy as vader seemed overloaded. This was helpful. However one thing we forgot about is that the assimilators have a memory leak. This is something that's been an issue forever - like since we were compiling/running this on Sun/Solaris systems - yet completely impossible to find and fix. But an easy band aid is to have a cron job that restart the assimilators every so often to clear the pipes. Well, oops, we didn't have that cron job on synergy and the system wedged over the weekend. That cron job is now in place. But still.. not sure why it's so easy for user processes to lock up a whole system to the point you can't even get a root prompt. There should always be enough resources to get a root prompt. The mysql replica continued to fall behind, so the easiest thing to try next was upgrading mysql from 5.1.x to 5.5 (which employs better parallelization, supposedly, and therefore better i/o in times of stress). However, Fedora Core 15 is the first version of Fedora to have mysql 5.5 in its rpm repositories. So I upgraded jocelyn to FC15.. only to find for some reason this version of Fedora cannot load the firmware/drivers for the old QLogic fibre channel card, and therefore can't see the data drives. I've been beating my head on this problem for days now to no avail. We could downgrade, but then we can't use mysql 5.5. I guess we could install mysql 5.5 ourselves instead of yumming it in, but that's given us major headaches in the past. This should all just work like it had in earlier versions of Fedora. Jeez. Thanks for the kind words in the previous thread. Don't worry - I won't let it get to my head :). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
Here's another catch-up tech news report. No big news, but more of the usual. If you end up needing a new CPU for Bruno let me/us know what type and we'll get you sent a replacement asap. Executive Director GPU Users Group Inc. - brad@gpuug.org |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, keep up all the good work, Claggy |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30933 Credit: 53,134,872 RAC: 32 |
Matt: You should be able to get the root prompt, but you might not be able to launch (page in) ssh/login/bash to get any prompt. Not sure how you are configed, but you might need to leave a terminal logged in and set to above normal priority. Obviously a security risk so it needs to behind a physically locked door. As to that leak, not sure what debugging tools you have, but unless it is one of the POSIX designed in leaks, you should be able to find and quash it. Perhaps a little personal development time reading up on the different available tools might find a new path to try. Worse you will find the right tool but it isn't available for Fedora. e.g. Malloc Debug http://www.manpagez.com/man/3/malloc/ |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
thanks for the update Matt Best Wishes Byron |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Glad to see that your music hasn't completely taken you away from us... yet. Thanks for the update Matt! |
Berserker Send message Joined: 2 Jun 99 Posts: 105 Credit: 5,440,087 RAC: 0 |
I've spent more than my fair share of time with malloc debug and various equivalents. It can work, but for non-trivial cases it can take like, forever (I've spent weeks on this sort of problem). Hopefully there's some decent memory profilers for *nix. If so, can you dummy up a bucketload of either simulated or actual data and throw at a testbed assimilator. Memory profiling should at least help you with where to look, if it doesn't give you the smoking gun. That said, as it's all DB backed, unclosed queries/result sets would be a place to start. Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking. |
Berserker Send message Joined: 2 Jun 99 Posts: 105 Credit: 5,440,087 RAC: 0 |
As for MySQL, I've rolled my own (I use Gentoo, ergo I have no choice), and have had no troubles (but then I don't use InnoDB, replicas or countless other features you do). The trick, as ever, is finding a 'good' version and then figuring out what arcane combination of configure options pushes the right buttons to make it have all the features you want, in the right order. Not sure if Fedora have 'volunteers' as such, but if they do, maybe one of them could help. Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking. |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
Thanks for the update Matt. Much appreciated. |
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Thanks for the update Matt. Hope things start getting better for everyone there in the lab. Thanks for your hard work and dedication to the project. Good luck with the music.... Play a few songs for me. ;) |
Mike Send message Joined: 17 Feb 01 Posts: 34352 Credit: 79,922,639 RAC: 80 |
Thanks for the update Matt. With each crime and every kindness we birth our future. |
rob smith Send message Joined: 7 Mar 03 Posts: 22456 Credit: 416,307,556 RAC: 380 |
I assume that when Matt is silent either things are going according to plan, so there is nothing really to report, or things are going so badly that he hasn't got time to report. I hope that in the next few weeks it is the former that dominates, and that his plans to divert more time to his music are well fulfilled, without too much interruption from the lab. (off topic - Matt what's the guitar in your sig, and how's your young, feline, apprentice doing, looks as if it could be a mean picker.....) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
If the driver is still part of the Linux kernel source code, you could just compile a custom kernel as part of your Fedora installation. Copy the /boot/config-2.6.xx file into /usr/src/kernel/2.6.xx/.config before running the make menuconfig. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30933 Credit: 53,134,872 RAC: 32 |
If the event causing the leak happens infrequently, finding it in the mounds and mounds of output can take forever. Hence finding a tool that reduces output when all memory is reachable makes the task perhaps possible. There will be delay in the output and going backwards to find the issue is another matter. If the issue is library calls that leak - there are some - then the problem may be intractable. If enabling debugging makes the process too slow, that is another issue. But if you can find out what use the block is that leaks then you can design in debugging to find where it may go missing if a read through doesn't tell you. |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Matt thank you and the rest of the SETI@home crew for all your hard. Best Wishes Byron |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Thanks for the kind words in the previous thread. Don't worry - I won't let it get to my head :). Matt, that is the least of our worries. (8{) Donald Infernal Optimist / Submariner, retired |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
Matt thank you and thanks to the rest of the SETI@home crew for all your hard work. you guys are the best Best Wishes Byron |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Query: how come we're (or, at least I'm) able to upload, even though the upload server shows as "disabled" on the "Server Status" page? . Hello, from Albany, CA!... |
DrFoo Send message Joined: 17 Jul 99 Posts: 26 Credit: 28,975,189 RAC: 0 |
Matt: Something I ran across recently that is probably right on target for you guys (and a lot of others). If you want the stability of RHEL/CentOS AND the latest key software packages (like MySQL 5.5), I don't know of a better way to go. It's sponsored by RackSpace who obviously knows something about this sort of thing and has a vested interest in making it all work. http://iuscommunity.org/About |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Looks (from the Cricket graph) like something broke around 1600 Berkeley time on Sunday, July 3... Can't upload... and only re-trys are downloading. . Hello, from Albany, CA!... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.