Message boards :
Technical News :
I Can't Drive RAID5 (Sep 23 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
We had the regular database maintenance outage today - no news there and we're recovering from that now. We have several backlogged data pipeline jobs adding much noise to our backend network, so progress is slower than normal. We also planned to do some OS upgrading today but were blocked waiting for some backup jobs to finish. The influx of free time led me to do some extensive testing regarding our general bottlenecks as of late. I'll cut to the chase. We can blame RAID5 for pretty much everything. No real shocker there, but I was surprised by the extent of RAID5's lousy performance. In one example, a large file copied from temp space to a directly attached RAID5 partition took two minutes, and the same file copied over NFS to a remote RAID10 device took 6 seconds (file caching had nothing to do with it, in case you're wondering). While some systems handle RAID5 (or RAID4) much better than others, we simply can't afford the performance hit on the writes no matter how fast the parity bits are computed. So why choose RAID5? Well, you get far more raw storage that way. But that's pretty much it as far as I care. Unfortunately in some cases (like our raw data storage buffer on thumper) we need every terabyte we can get. Seems kinda silly what with single terabyte drives readily available to the world, but spindle count is also quite important to us. In any case we have some convertin' to RAID10 ahead of us on several systems and the usual round of careful/paranoid testing. I don't think we have much of a choice in making some of thumper's partitions RAID10 as well, and that'll mean sometime in the future a planned outage of indeterminate length. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Thanks for the Update Matt and others @ Berkeley - Much appreciated All "in the future a planned outage of indeterminate length" oh oh! ;) BOINC Wiki . . . Science Status Page . . . |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Thanks for info... that'll mean sometime in the future a planned outage of indeterminate length Don't hesitate with that outage too long, when your plans are prepared. _\|/_ U r s |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Just had to say... love the title! |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Matt, I warned you about this back in February. RAID 5 is not worth the write penalty (except for mostly read-only filesystems). I was only trying to help by my comments in that past thread (linked above).... Stripe size makes a big difference in performance, so when you're converting to RAID 10, try a copy of stripe sizes and compare speeds of WU file copies. Just because your WU sizes are 16K doesn't mean your stripe size should be that. I've personally had better luck with larger stripe sizes, but I cannot recommend an exact stripe size without more info about the filesystem's data. There has also been some debate about ext2 vs ext3...but I think you're safe on that bet ext3 on 2.6 kernels. If you want extreme performance from thumper or bambi, InnoDB is a storage engine for MySQL that uses a raw filesystem for the database (similar to Oracle's OCFS). It costs money, but I thought I'd mention the possibility. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I have noticed that some implementations of RAID5 are much worse than others. Even though Silicon Image makes very rock-solid controllers, I have found that they don't do RAID5 very well, mostly because the parity calculations are offloaded to the system CPU. I ended up getting an Areca PCI-e x4 controller for a 4x500 RAID5 SATA2 setup, and I am thoroughly impressed. The card has a 500MHz dedicated processor and 256mb DDR266 built-in. The next model up (8-port card) has a SO-DIMM slot on it that supports up to 2GB. At any rate, my 4x500 RAID5 setup runs 220MB/sec at the beginning and drops down to about 115MB/sec at the end, with an average of 174.0MB/sec and a burst of 649.1MB/sec. HDTach screenshot That is all for read access, but for writing, I can copy a 4.3GB DVD image from one partition to another in under a minute, which is amazing since the same heads have to read in one place and write somewhere else. 4.3GB in 60 seconds is about 73MB/sec, but that's about half of the read or write since the heads are doing both. RAID5 isn't an abomination, it just needs the right hardware. :D Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
ML1 Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20 |
... RAID5 isn't an abomination, it just needs the right hardware. :D Like all things, it just needs a little sympathy and empathy... It is ideal for some tasks and an abomination for other tasks. The controller/interface bottlenecks always must be factored in also. Good that a major bottleneck has been uncovered. Perhaps NFS isn't so flaky after all :-p (Myself, I still prefer to minimise cross-machine mounts for the sake of reliability. Then again, NFS is sometimes just too useful and convenient!) Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
I'm using the 24-port version - and yes, it's fast, both RAID5 and 6. But it's useless talking about controllers when Matt likes using software-raid: Matt's Post mic. |
ML1 Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20 |
... But it's useless talking about controllers when Matt likes using software-raid: Matt's Post I agree for using (linux) software raid. You're far less likely to lose all your data for any hardware failure. And as with any hardware, you need to be aware of the IO (maximum) rates. Also sweet 'n' cheap :-) Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Yes, and software RAID = CPU does parity calculations. Therefore, it slows down considerably vs hardware solution. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Couple things - I agree with pretty much everything people are saying.. we knew RAID4/5 were worse than otherwise, but usually opted for them because of (1) space needs and (2) actually had no option (the NetApp NAS system, for example, uses RAID4 and only RAID4 - and it's the fastest non-RAID1/0 system around here by a longshot - but still too slow for anything hard core). I mentioned this before, maybe it got lost, but Jeff and I are opting for hardware RAID more and more depending on the system. Previous issues with it have been resolved. The RAID5 system I was testing that had terrible performance is actually hardware RAID (in fact *two* controllers in one system - though I admit those controller are kinda old). We use innodb for the BOINC database (which is mysql) and use informix for the Science database. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Thanks, Matt. Could you do us a favor? Decide a few of the most bad-ass RAID cards that you need and put them on the hardware donation list. I'm sorry you got poor-performing hardware RAID cards. They are all not like that. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
I'm sorry you got poor-performing hardware RAID cards. They are all not like that. These are old cards in an old system (actually not that old - which is why the poor performance is surprising - well, I guess more disappointing than surprising). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Swibby Bear Send message Joined: 1 Aug 01 Posts: 246 Credit: 7,945,093 RAC: 0 |
Matt -- "Maximum PC" magazine did an article explaining RAID (Nov 2007) and another comparing RAID cards (May 2008). It seems to me they did all the leg work for you, comparing cards and prices. Hope this helps you to get the best bang for the buck. http://www.maximumpc.com/article/raid_controllers_compared |
ML1 Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20 |
Thanks, Matt. Could you do us a favor? Decide a few of the most bad-ass RAID cards that you need and put them on the hardware donation list. ... Add at least one spare RAID card so that you can recover your array when a card fails...? Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
http://www.maximumpc.com/article/raid_controllers_compared Good info. Thanks. Bear in mind that pretty much *all* our hardware we didn't actually buy/select ourselves. It's a "work with what ya got" mentality around here. No reflection on the quality of the donations, as most systems we get are killer, but our needs our sufficiently unique, large and complicated (and our manpower resources few) that every sysadmin day at the lab seems like an episode of "Junkyard Wars." That said, when we have a better idea what we require (thanks to experience, or useful outside information like the above link) we'll update the hardware donation pages to reflect that - we are cautious not to ask for stuff we're not sure we really need. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
....that every sysadmin day at the lab seems like an episode of "Junkyard Wars." "Scrapheap Challenge" for our friends in the U.K. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Matt re: Comment 811617 (mentality) - reminds me of Michael Crichton's writings ;) BOINC Wiki . . . Science Status Page . . . |
Bad Spartan Send message Joined: 14 Mar 03 Posts: 6 Credit: 47,663,261 RAC: 117 |
From a performance perspective RAID5 are very good on reads to the file system but trail to RAID1+0 in writes. For every write operation to a volume two writes occur at the drives level: one for the data itself and one for the generated parity, distributed between the spindles, for redundancy in case of a drive failure. In contrast RAID 1+0 writes data to the drive pair at once and no parity have to be generated and/or be written to the disks. Thus RAID 1+0 volumes are good for database logs repositories and RAID 5 volumes for databases where most of the operations are selects. |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
Somehow reminds me of "Kryten", the Red Dwarf one, not the former server! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.