Message boards :
Technical News :
Looking for the New Sound (Feb 26 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Let's see.. it's been a bit since I last wrote. I've been mostly working on code to pull pulses out of the database, which uncovered a couple general minor bugs that had to be fixed. These were successfully dumped and handed off to Josh to find good candidates for initial Astropulse analysis. Not much going on over the weekend but the science database server (thumper) is not performing. Jeff and I scanned all kinds of data during different tests and we're convinced it's the RAID configuration more than anything else. We're going to have to reconfigure all the file systems on that at some point. Painful, but we may be able to do it piece by piece without too much disruption. Today we actually upgraded the way-out-of-date OS on thumper, which was also a bit painful, but ultimately successful. It should have been up and running by now, but thanks to an 8 Terabyte ext3 filesystem that hasn't been checked in over 180 days, a forced check is running and will probably be running all night. Not sure if we'll implement the secondary server (bambi) in the meantime - it may be too late in the day to attempt that. We'll let the project run as best it can until we run out of work (we'll probably keep a buffer of work just so the recovery later isn't as painful). Meanwhile, the assimilator queue is growing and growing until we either let it drain, or we reconfigure thumper. Oh yeah.. bane (one of the download servers) just went kaput. Spent 20 minutes trying to figure out what went wrong with its network. Oh - the cable came out of the switch. Click. Voila! In good news, Jeff has been hammering on the new router today, and we got over a major hurdle of getting IOS installed on it. Only thing left now is configuration. It might be ready tomorrow! Buckle your seatbelts. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Let's see.. it's been a bit since I last wrote. I've been mostly working on code to pull pulses out of the database, which uncovered a couple general minor bugs that had to be fixed. These were successfully dumped and handed off to Josh to find good candidates for initial Astropulse analysis. I once questioned why you would have CatOS on a router. Seeing the photo of the router in the swtich closet showed me that it is not really a pure router, but a Catalyst 6504-E multilayer switch that can act as a router as well, which means that it was running CatOS on the switch portion of the multilayer switch, and IOS on the router portion of the switch. I wonder if you will use that multilayer switch's switching capabilities. If you used it as a switch and as a router by having your Internet-facing servers plugged directly into it, then you could have less latency between these servers and the SETI@home clients. This could reduce the number of connections they have to have open at once because less latency means that the connections can close sooner. By the way, on an unrelated note but related to the router, do you have any plans for the big IPv6 switchover whenever we finally run out of IPv4 addresses? |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Not much going on over the weekend but the science database server (thumper) is not performing. Jeff and I scanned all kinds of data during different tests and we're convinced it's the RAID configuration more than anything else. We're going to have to reconfigure all the file systems on that at some point. Painful, but we may be able to do it piece by piece without too much disruption. You could flip Bambi as primary, reconfigure the drives, and then replicate the database again to Thumper. Hope you have a backup just before doing this. :) Bambi seems to do well, right? What's her RAID configuration? Can you do the same with Thumper? Do the database logs have their own array now? Also, if it's running on any flavor of Linux, I've heard it helps to put elevator=deadline in the kernel parameters upon boot. If it's SunOS, then nevermind. |
David Send message Joined: 19 May 99 Posts: 411 Credit: 1,426,457 RAC: 0 |
Oh yeah.. bane (one of the download servers) just went kaput. Spent 20 minutes trying to figure out what went wrong with its network. Oh - the cable came out of the switch. Click. Voila! Thanks for the updates Matt. Next time dont forget to look for the easy fixes first lol |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
Maybe now is the time to start th FS - discussion (--> Sleepless in Oakland) if you reconfigure the file systems anyway. Not much going on over the weekend but the science database server (thumper) is not performing. Jeff and I scanned all kinds of data during different tests and we're convinced it's the RAID configuration more than anything else. We're going to have to reconfigure all the file systems on that at some point. Painful, but we may be able to do it piece by piece without too much disruption. mic. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Maybe now is the time to start th FS - discussion (--> Sleepless in Oakland) if you reconfigure the file systems anyway. You're not going to convince him to get rid of ext3. However, for a database, ext2 makes a little more sense. There's no reason to have a filesystem journal when the database has its own journaling mechanism (log files). That may be a performance boost (except for running fsck -f). Using any other filesystem brings greater risk. Of course, they had better go RAID 10 or else. :) |
sjf Send message Joined: 17 Aug 99 Posts: 5 Credit: 10,617,892 RAC: 0 |
You can just disable periodic checking, assuming you trust your storage hardware. Check out man tune2fs ... or: for i in `mount -t ext3 | awk '{print $1}'`; do tune2fs -C 0 -i 0 $i; done A lot of better storage hardware can validate media and parity data on the fly. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
You can just disable periodic checking, assuming you trust your storage hardware. Check out man tune2fs ... or: Given the number of disk failures in the recent past, I doubt that the Berkeley staff "trusts their storage hardware"! . Hello, from Albany, CA!... |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
From what I read, that 8Tb partition is for storing WUs/Results - not for the database. You're not going to convince him to get rid of ext3. However, for a database, ext2 makes a little more sense. There's no reason to have a filesystem journal when the database has its own journaling mechanism (log files). That may be a performance boost (except for running fsck -f). Using any other filesystem brings greater risk. Of course, they had better go RAID 10 or else. :) mic. |
Neil Walker Send message Joined: 23 May 99 Posts: 288 Credit: 18,101,056 RAC: 0 |
Of course, they had better go RAID 10 or else. :) Either you are kidding or you don't know what you are talking about. :P RAID 10 is shorthand for RAID 1 + 0. AFAIK, The S@H team have always used RAID 5. That is the minimum for an application of this kind. Be lucky Neil |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Of course, they had better go RAID 10 or else. :) I was kidding about the "they'd better...or else" part. However, I am not as much kidding about RAID 1+0. Especially with database loads, RAID 10 is superior in performance and redundancy. See this for an explanation if you need it: http://www.bytepile.com/raid_class.php |
Neil Walker Send message Joined: 23 May 99 Posts: 288 Credit: 18,101,056 RAC: 0 |
See this for an explanation if you need it: I don't. ;) Maybe you should read it again in the context of the needs of S@H and the resources available.;) Be lucky Neil |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I don't. ;) Maybe you should read it again in the context of the needs of S@H and the resources available.;) I did read that with SETI in mind. Unless there's an unsolvable reason why they can't make a RAID10, then that is what I recommend. For all the non-database servers, RAID5 is fine because it allows them lower cost per GB. Besides, Matt has already made his decision. |
sjf Send message Joined: 17 Aug 99 Posts: 5 Credit: 10,617,892 RAC: 0 |
You can just disable periodic checking, assuming you trust your storage hardware. Check out man tune2fs ... or: Disk failures are irrelevant if you're using reliable RAID controllers and you're replacing disks in a a timely manner. Both of which they seem to be doing. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
You can just disable periodic checking, assuming you trust your storage hardware. Check out man tune2fs ... or: I was replying to the "assuming you trust your storage hardware"... AND IIRC, they've also had some RAID controller failures... [irony intended] what are you, a member of the Borg? [/irony] ;-) . Hello, from Albany, CA!... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.