Message boards :
Technical News :
House of Fun (Jul 29 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Today we had our usual Tuesday outage which was a bit longer than usual as we had extra things to take care of (outside of the usual BOINC database table compression and backup to disk). I failed to mention yesterday (though many have noticed) that db_dump hasn't been working for days, which means our stats have flatlined all weekend. This was because our mysql replica failed (we run these expensive stats lookups on the replica so they don't affect the more important updates running on the master). So part of the outage today was to rebuild this replica from scratch via the dump from the master. It was easy - we do this regularly anyway - just takes a long time. Also, Jeff and I replaced a failed drive on thumper (the science database server). There are 48 drives on the thing so disk failures are common, and we get Sun support on this important system. We ask for a drive, they send one, we put it in and ship the old one back. Easy as pie. Unfortunately the software RAID on this system made some bogus complaints upon restart (unrelated to the device that required the new drive). I'm not sure why mdadm gets confused - for example I converted a couple spare drives to a new RAID device, which works fine, but upon reboot (many months later) mdadm freaks out that those spares are missing. Anyway, this was mostly harmless, and another warning we really need a fresh OS install on this system sooner than later (that'll be scary). We're running full bore now. It'll take a while to catch up, and we may temporarily run out of work again (still not a comfortable amount of free disk space on the workunit storage). But it'll all clear up eventually. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit. I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay -- lots of air around the drive, and usually good airflow. Most of the "professional" RAID systems by necessity cram a whole bunch of drives into as small a box as possible. Those things get hot. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit. RAID drives are designed to be used for a certain life span and have a certain percentage fail. Heat (at least below 60c) doesn't seem to be a factor (this is backed by experimental research). Buying a more expensive disk may help the failure rates, but they cost more too. Modern professional RAID cages are designed to put a lot of drives together but also to give them sufficient air flow. The air in those cages moves faster than anyone's home tower PC. I believe Matt has SMART monitoring on the servers, so perhaps he could post the typical drive operating temperature, to give you an idea. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
You're good with those truisms... ;-) It's like when Dan Quayle said, "The future will be better tomorrow." Sorry, couldn't resist... Also, I can't resist noting that a non-Windows OS is deemed to need a "fresh install" aka "reinstall". Wow... |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I was advocating adding RAM to increase OS-resident caching to reduce the disk load. Any thoughts on how practical this is? |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
I was advocating adding RAM to increase OS-resident caching to reduce the disk load. Any thoughts on how practical this is? If I recall correctly, the machines are maxed out. BOINC WIKI |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
I was advocating adding RAM to increase OS-resident caching to reduce the disk load. Any thoughts on how practical this is? Consider 48 500Gb drives... 24 Terabytes raw space before striping. Adding ram will help but give diminishing returns, as whatever is in that ram will have to be written to the drives to make room for more data, and as the massive amount of data is held all over the place and randomly accessed it will be continuously flushed and need reloading a short time later. Even a 1% cache could be around 64-120Gb RAM depending on striping. It can help for predictive non-sequential writing to minimise head seeks. If there is a large enough buffer, files can be written to the disk that suits the current position of a head and layout of inodes. This means the ends and middles of files could be written before the beginning! I think few operating systems use this algoritm for writing to disk because it is a nightmare. Programmers think sequentially for an easier life... Disk systems are discrete systems, subsystems and sub-subsystems etc. connected by sequential interfaces. Clever higher level algorithms may need the cooperation of sub-systems that just cannot comply. Cache helps the most recently used files that are actively being worked on between processes, but these are continuously churning and raw I/O quickly becomes the bottleneck with the huge numbers of unpredictable transactions involved. Caching and buffering is integral on all levels of the system, not just in hard disk for files, but query caches, transaction caches, instruction caches, memory caches like memcached, opcode pipelines but each have an optimum break-even point beyond which more can be less. Running the whole system in RAM would be great... and hopefully be cost effective in a few years time to dump all hard disks and use SSRAID instead... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Running the whole system in RAM would be great... and hopefully be cost effective in a few years time to dump all hard disks and use SSRAID instead... For an idea of the performance advantages of SSDs, check out this article. In particular, take note of the Database I/O results when running in RAID5 configuration (For example...). Grant Darwin NT |
AlphaLaser Send message Joined: 6 Jul 03 Posts: 262 Credit: 4,430,487 RAC: 0 |
Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit. Google performed a study on disk failure within its datacenter. From the paper:
|
ML1 Send message Joined: 25 Nov 01 Posts: 20331 Credit: 7,508,002 RAC: 20 |
... Also, I can't resist noting that a non-Windows OS is deemed to need a "fresh install" aka "reinstall". Wow... Reread again... The implication is to reinstall for the purpose of upgrading to a more recent version of the OS and utilities. A reminder of the OLD AGE of the present system was given by an old (long ago debugged-out?) 'feature' of the (software raid) "mdadm" giving a false spares warning niggle... Wot!? No viruses?! :-p Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Neil Walker Send message Joined: 23 May 99 Posts: 288 Credit: 18,101,056 RAC: 0 |
I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay Actually, they are (and always have been) half-height bays. The original floppy and hard disks took the equivalent space of 2 of those bays. Anybody with half a clue about what they are doing puts a fan in front of HDs no matter what height they claim to be. Be lucky Neil |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
Thanks for the links Grant, exciting times ahead... |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Which is why I don't like to hang out in my server room. It is noisy. One of my servers sounds like it is going to start hovering. I understand the argument, but it gets pretty hard to jam LOTS of air through a tiny space sometimes. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay For a while at the beginning of the PC there were full height bays that were not drilled for half height drives. I had a PC with 2 full height bays like this (each with one full height floppy drive installed). The hard disks (there were none for PCs when I bought my first) ended up in an external drive bay (with 2 full height bays that were drilled for half height use). I haven't seen a full height drive in almost 2 decades. BOINC WIKI |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay Of course there were full height bays. You can't have a term like "half height" without ever having a "full height" bay. Otherwise the "half heights" would have just been called "full height" bays. They quickly went by the way side as things became smaller, but they definitely existed. |
Mumps [MM] Send message Joined: 11 Feb 08 Posts: 4454 Credit: 100,893,853 RAC: 30 |
I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay My first hard drive was a half-height 5 1/4 inch formfactor 20 Meg drive. I eventually bought 2 full height 5 1/4 inch drives for a BBS I ran. 300 Meg each. They had to go in an external enclosure with a 200 watt power supply because the enclosures original 120 watt PS couldn't spin them both up at the same time. :-) |
ML1 Send message Joined: 25 Nov 01 Posts: 20331 Credit: 7,508,002 RAC: 20 |
My first hard drive was a half-height 5 1/4 inch formfactor 20 Meg drive... And then came the "BigFoot" HDDs! Time moves on quickly. And it is fantastic what can be done with stable standards and well defined interfaces and open free competition... The PC turned out to be a very good 'mistake' for the world if not for IBM... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
For a while at the beginning of the PC there were full height bays that were not drilled for half height drives. I had a PC with 2 full height bays like this (each with one full height floppy drive installed). The hard disks (there were none for PCs when I bought my first) ended up in an external drive bay (with 2 full height bays that were drilled for half height use). I haven't seen a full height drive in almost 2 decades. The 'full height' form factor predates the PC (as we know it, meaning IBM-compatible). My first personal computer, in about 1980, had a full-height 5.25" floppy drive in an external enclosure linked by an S100-bus ribbon cable - it cost about the same as the (48KB memory) computer it was attached to. I've still got it in my cellar (basement). |
William Roeder Send message Joined: 19 May 99 Posts: 69 Credit: 523,414 RAC: 0 |
My first hard drive was a half-height 5 1/4 inch formfactor 20 Meg drive.My first hard drive was a 14 inch form factor with a whopping two megs My first personal computer, in about 1980, had a full-height 5.25" floppy drive in an external enclosure linked by an S100-bus ribbon cable - it cost about the same as the (48KB memory) computer it was attached to. I've still got it in my cellar (basement).And your not using it to crunch SETI? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.