I Can't Drive RAID5 (Sep 23 2008)

Message boards : Technical News : I Can't Drive RAID5 (Sep 23 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 811390 - Posted: 23 Sep 2008, 23:17:01 UTC

We had the regular database maintenance outage today - no news there and we're recovering from that now. We have several backlogged data pipeline jobs adding much noise to our backend network, so progress is slower than normal.

We also planned to do some OS upgrading today but were blocked waiting for some backup jobs to finish. The influx of free time led me to do some extensive testing regarding our general bottlenecks as of late. I'll cut to the chase. We can blame RAID5 for pretty much everything. No real shocker there, but I was surprised by the extent of RAID5's lousy performance. In one example, a large file copied from temp space to a directly attached RAID5 partition took two minutes, and the same file copied over NFS to a remote RAID10 device took 6 seconds (file caching had nothing to do with it, in case you're wondering). While some systems handle RAID5 (or RAID4) much better than others, we simply can't afford the performance hit on the writes no matter how fast the parity bits are computed.

So why choose RAID5? Well, you get far more raw storage that way. But that's pretty much it as far as I care. Unfortunately in some cases (like our raw data storage buffer on thumper) we need every terabyte we can get. Seems kinda silly what with single terabyte drives readily available to the world, but spindle count is also quite important to us. In any case we have some convertin' to RAID10 ahead of us on several systems and the usual round of careful/paranoid testing. I don't think we have much of a choice in making some of thumper's partitions RAID10 as well, and that'll mean sometime in the future a planned outage of indeterminate length.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 811390 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 811402 - Posted: 23 Sep 2008, 23:34:18 UTC



. . . Thanks for the Update Matt and others @ Berkeley - Much appreciated All

"in the future a planned outage of indeterminate length" oh oh! ;)


BOINC Wiki . . .

Science Status Page . . .
ID: 811402 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 811410 - Posted: 24 Sep 2008, 0:20:47 UTC

Thanks for info...

that'll mean sometime in the future a planned outage of indeterminate length

Don't hesitate with that outage too long, when your plans are prepared.
_\|/_
U r s
ID: 811410 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 811432 - Posted: 24 Sep 2008, 2:18:29 UTC

Just had to say... love the title!
ID: 811432 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 811440 - Posted: 24 Sep 2008, 2:53:54 UTC - in response to Message 811390.  
Last modified: 24 Sep 2008, 2:54:53 UTC

Matt,

I warned you about this back in February. RAID 5 is not worth the write penalty (except for mostly read-only filesystems). I was only trying to help by my comments in that past thread (linked above)....

Stripe size makes a big difference in performance, so when you're converting to RAID 10, try a copy of stripe sizes and compare speeds of WU file copies. Just because your WU sizes are 16K doesn't mean your stripe size should be that. I've personally had better luck with larger stripe sizes, but I cannot recommend an exact stripe size without more info about the filesystem's data.

There has also been some debate about ext2 vs ext3...but I think you're safe on that bet ext3 on 2.6 kernels. If you want extreme performance from thumper or bambi, InnoDB is a storage engine for MySQL that uses a raw filesystem for the database (similar to Oracle's OCFS). It costs money, but I thought I'd mention the possibility.
ID: 811440 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 811469 - Posted: 24 Sep 2008, 5:05:12 UTC

I have noticed that some implementations of RAID5 are much worse than others. Even though Silicon Image makes very rock-solid controllers, I have found that they don't do RAID5 very well, mostly because the parity calculations are offloaded to the system CPU.

I ended up getting an Areca PCI-e x4 controller for a 4x500 RAID5 SATA2 setup, and I am thoroughly impressed. The card has a 500MHz dedicated processor and 256mb DDR266 built-in. The next model up (8-port card) has a SO-DIMM slot on it that supports up to 2GB. At any rate, my 4x500 RAID5 setup runs 220MB/sec at the beginning and drops down to about 115MB/sec at the end, with an average of 174.0MB/sec and a burst of 649.1MB/sec. HDTach screenshot

That is all for read access, but for writing, I can copy a 4.3GB DVD image from one partition to another in under a minute, which is amazing since the same heads have to read in one place and write somewhere else. 4.3GB in 60 seconds is about 73MB/sec, but that's about half of the read or write since the heads are doing both.

RAID5 isn't an abomination, it just needs the right hardware. :D
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 811469 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 811526 - Posted: 24 Sep 2008, 9:32:44 UTC - in response to Message 811469.  

... RAID5 isn't an abomination, it just needs the right hardware. :D

Like all things, it just needs a little sympathy and empathy... It is ideal for some tasks and an abomination for other tasks. The controller/interface bottlenecks always must be factored in also.

Good that a major bottleneck has been uncovered. Perhaps NFS isn't so flaky after all :-p

(Myself, I still prefer to minimise cross-machine mounts for the sake of reliability. Then again, NFS is sometimes just too useful and convenient!)

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 811526 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 811541 - Posted: 24 Sep 2008, 11:31:49 UTC


I ended up getting an Areca PCI-e x4 controller for a 4x500 RAID5 SATA2 setup, and I am thoroughly impressed.


I'm using the 24-port version - and yes, it's fast, both RAID5 and 6.
But it's useless talking about controllers when Matt likes using software-raid: Matt's Post


mic.


ID: 811541 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 811553 - Posted: 24 Sep 2008, 12:06:08 UTC - in response to Message 811541.  

... But it's useless talking about controllers when Matt likes using software-raid: Matt's Post

I agree for using (linux) software raid. You're far less likely to lose all your data for any hardware failure. And as with any hardware, you need to be aware of the IO (maximum) rates.

Also sweet 'n' cheap :-)

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 811553 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 811569 - Posted: 24 Sep 2008, 14:45:51 UTC - in response to Message 811541.  


I ended up getting an Areca PCI-e x4 controller for a 4x500 RAID5 SATA2 setup, and I am thoroughly impressed.


I'm using the 24-port version - and yes, it's fast, both RAID5 and 6.
But it's useless talking about controllers when Matt likes using software-raid: Matt's Post



Yes, and software RAID = CPU does parity calculations. Therefore, it slows down considerably vs hardware solution.
ID: 811569 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 811571 - Posted: 24 Sep 2008, 14:56:22 UTC

Couple things - I agree with pretty much everything people are saying.. we knew RAID4/5 were worse than otherwise, but usually opted for them because of (1) space needs and (2) actually had no option (the NetApp NAS system, for example, uses RAID4 and only RAID4 - and it's the fastest non-RAID1/0 system around here by a longshot - but still too slow for anything hard core).

I mentioned this before, maybe it got lost, but Jeff and I are opting for hardware RAID more and more depending on the system. Previous issues with it have been resolved. The RAID5 system I was testing that had terrible performance is actually hardware RAID (in fact *two* controllers in one system - though I admit those controller are kinda old).

We use innodb for the BOINC database (which is mysql) and use informix for the Science database.

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 811571 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 811592 - Posted: 24 Sep 2008, 16:08:10 UTC - in response to Message 811571.  
Last modified: 24 Sep 2008, 16:09:09 UTC

Thanks, Matt. Could you do us a favor? Decide a few of the most bad-ass RAID cards that you need and put them on the hardware donation list.

I'm sorry you got poor-performing hardware RAID cards. They are all not like that.
ID: 811592 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 811599 - Posted: 24 Sep 2008, 16:26:57 UTC - in response to Message 811592.  

I'm sorry you got poor-performing hardware RAID cards. They are all not like that.


These are old cards in an old system (actually not that old - which is why the poor performance is surprising - well, I guess more disappointing than surprising).

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 811599 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 811605 - Posted: 24 Sep 2008, 17:03:55 UTC
Last modified: 24 Sep 2008, 17:06:15 UTC

Matt -- "Maximum PC" magazine did an article explaining RAID (Nov 2007) and another comparing RAID cards (May 2008). It seems to me they did all the leg work for you, comparing cards and prices. Hope this helps you to get the best bang for the buck.

http://www.maximumpc.com/article/raid_controllers_compared
ID: 811605 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 811615 - Posted: 24 Sep 2008, 18:06:59 UTC - in response to Message 811592.  

Thanks, Matt. Could you do us a favor? Decide a few of the most bad-ass RAID cards that you need and put them on the hardware donation list. ...

Add at least one spare RAID card so that you can recover your array when a card fails...?

Regards,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 811615 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 811617 - Posted: 24 Sep 2008, 18:09:41 UTC - in response to Message 811605.  

http://www.maximumpc.com/article/raid_controllers_compared


Good info. Thanks. Bear in mind that pretty much *all* our hardware we didn't actually buy/select ourselves. It's a "work with what ya got" mentality around here. No reflection on the quality of the donations, as most systems we get are killer, but our needs our sufficiently unique, large and complicated (and our manpower resources few) that every sysadmin day at the lab seems like an episode of "Junkyard Wars." That said, when we have a better idea what we require (thanks to experience, or useful outside information like the above link) we'll update the hardware donation pages to reflect that - we are cautious not to ask for stuff we're not sure we really need.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 811617 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 811684 - Posted: 24 Sep 2008, 23:14:22 UTC - in response to Message 811617.  

....that every sysadmin day at the lab seems like an episode of "Junkyard Wars."


"Scrapheap Challenge" for our friends in the U.K.

ID: 811684 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 811688 - Posted: 24 Sep 2008, 23:23:56 UTC


. . . Matt re: Comment 811617 (mentality) - reminds me of Michael Crichton's writings ;)


BOINC Wiki . . .

Science Status Page . . .
ID: 811688 · Report as offensive
Bad Spartan
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 6
Credit: 47,663,261
RAC: 117
Puerto Rico
Message 811703 - Posted: 25 Sep 2008, 0:45:38 UTC

From a performance perspective RAID5 are very good on reads to the file system but trail to RAID1+0 in writes. For every write operation to a volume two writes occur at the drives level: one for the data itself and one for the generated parity, distributed between the spindles, for redundancy in case of a drive failure. In contrast RAID 1+0 writes data to the drive pair at once and no parity have to be generated and/or be written to the disks.

Thus RAID 1+0 volumes are good for database logs repositories and RAID 5 volumes for databases where most of the operations are selects.


ID: 811703 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 811704 - Posted: 25 Sep 2008, 0:46:46 UTC - in response to Message 811688.  


. . . Matt re: Comment 811617 (mentality) - reminds me of Michael Crichton's writings ;)



Somehow reminds me of "Kryten", the Red Dwarf one, not the former server!
ID: 811704 · Report as offensive
1 · 2 · Next

Message boards : Technical News : I Can't Drive RAID5 (Sep 23 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.