House of Fun (Jul 29 2008)

Message boards : Technical News : House of Fun (Jul 29 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 789625 - Posted: 29 Jul 2008, 23:13:57 UTC

Today we had our usual Tuesday outage which was a bit longer than usual as we had extra things to take care of (outside of the usual BOINC database table compression and backup to disk).

I failed to mention yesterday (though many have noticed) that db_dump hasn't been working for days, which means our stats have flatlined all weekend. This was because our mysql replica failed (we run these expensive stats lookups on the replica so they don't affect the more important updates running on the master). So part of the outage today was to rebuild this replica from scratch via the dump from the master. It was easy - we do this regularly anyway - just takes a long time.

Also, Jeff and I replaced a failed drive on thumper (the science database server). There are 48 drives on the thing so disk failures are common, and we get Sun support on this important system. We ask for a drive, they send one, we put it in and ship the old one back. Easy as pie. Unfortunately the software RAID on this system made some bogus complaints upon restart (unrelated to the device that required the new drive). I'm not sure why mdadm gets confused - for example I converted a couple spare drives to a new RAID device, which works fine, but upon reboot (many months later) mdadm freaks out that those spares are missing. Anyway, this was mostly harmless, and another warning we really need a fresh OS install on this system sooner than later (that'll be scary).

We're running full bore now. It'll take a while to catch up, and we may temporarily run out of work again (still not a comfortable amount of free disk space on the workunit storage). But it'll all clear up eventually.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 789625 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 789639 - Posted: 29 Jul 2008, 23:31:46 UTC

Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit.
ID: 789639 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 789641 - Posted: 29 Jul 2008, 23:35:55 UTC - in response to Message 789639.  

Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit.

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay -- lots of air around the drive, and usually good airflow.

Most of the "professional" RAID systems by necessity cram a whole bunch of drives into as small a box as possible. Those things get hot.
ID: 789641 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 789675 - Posted: 30 Jul 2008, 0:19:01 UTC - in response to Message 789641.  

Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit.

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay -- lots of air around the drive, and usually good airflow.

Most of the "professional" RAID systems by necessity cram a whole bunch of drives into as small a box as possible. Those things get hot.


RAID drives are designed to be used for a certain life span and have a certain percentage fail. Heat (at least below 60c) doesn't seem to be a factor (this is backed by experimental research). Buying a more expensive disk may help the failure rates, but they cost more too.

Modern professional RAID cages are designed to put a lot of drives together but also to give them sufficient air flow. The air in those cages moves faster than anyone's home tower PC.

I believe Matt has SMART monitoring on the servers, so perhaps he could post the typical drive operating temperature, to give you an idea.
ID: 789675 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 789684 - Posted: 30 Jul 2008, 0:42:48 UTC - in response to Message 789675.  


Buying a more expensive disk may help the failure rates, but they cost more too.


You're good with those truisms... ;-)

It's like when Dan Quayle said, "The future will be better tomorrow."

Sorry, couldn't resist...

Also, I can't resist noting that a non-Windows OS is deemed to need a "fresh install" aka "reinstall". Wow...
ID: 789684 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 789687 - Posted: 30 Jul 2008, 0:45:46 UTC

I was advocating adding RAM to increase OS-resident caching to reduce the disk load. Any thoughts on how practical this is?
ID: 789687 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 789726 - Posted: 30 Jul 2008, 3:25:15 UTC - in response to Message 789687.  

I was advocating adding RAM to increase OS-resident caching to reduce the disk load. Any thoughts on how practical this is?

If I recall correctly, the machines are maxed out.


BOINC WIKI
ID: 789726 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 789727 - Posted: 30 Jul 2008, 3:26:35 UTC - in response to Message 789687.  

I was advocating adding RAM to increase OS-resident caching to reduce the disk load. Any thoughts on how practical this is?


Consider 48 500Gb drives... 24 Terabytes raw space before striping.
Adding ram will help but give diminishing returns, as whatever is in that ram will have to be written to the drives to make room for more data, and as the massive amount of data is held all over the place and randomly accessed it will be continuously flushed and need reloading a short time later.
Even a 1% cache could be around 64-120Gb RAM depending on striping.

It can help for predictive non-sequential writing to minimise head seeks. If there is a large enough buffer, files can be written to the disk that suits the current position of a head and layout of inodes. This means the ends and middles of files could be written before the beginning!
I think few operating systems use this algoritm for writing to disk because it is a nightmare. Programmers think sequentially for an easier life...
Disk systems are discrete systems, subsystems and sub-subsystems etc. connected by sequential interfaces. Clever higher level algorithms may need the cooperation of sub-systems that just cannot comply.

Cache helps the most recently used files that are actively being worked on between processes, but these are continuously churning and raw I/O quickly becomes the bottleneck with the huge numbers of unpredictable transactions involved.

Caching and buffering is integral on all levels of the system, not just in hard disk for files, but query caches, transaction caches, instruction caches, memory caches like memcached, opcode pipelines but each have an optimum break-even point beyond which more can be less.

Running the whole system in RAM would be great... and hopefully be cost effective in a few years time to dump all hard disks and use SSRAID instead...
ID: 789727 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 789775 - Posted: 30 Jul 2008, 7:58:42 UTC - in response to Message 789727.  

Running the whole system in RAM would be great... and hopefully be cost effective in a few years time to dump all hard disks and use SSRAID instead...

For an idea of the performance advantages of SSDs, check out this article.
In particular, take note of the Database I/O results when running in RAID5 configuration (For example...).
Grant
Darwin NT
ID: 789775 · Report as offensive
Profile AlphaLaser
Volunteer tester

Send message
Joined: 6 Jul 03
Posts: 262
Credit: 4,430,487
RAC: 0
United States
Message 789798 - Posted: 30 Jul 2008, 10:42:42 UTC - in response to Message 789675.  

Do these disks fail (so frequently?) because they are merely on all the time, or because they are constantly being accessed? If the latter, I wonder if you have the option at all to increase a RAM-based memory cache size in order to reduce the disk accessing a bit.

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay -- lots of air around the drive, and usually good airflow.

Most of the "professional" RAID systems by necessity cram a whole bunch of drives into as small a box as possible. Those things get hot.


RAID drives are designed to be used for a certain life span and have a certain percentage fail. Heat (at least below 60c) doesn't seem to be a factor (this is backed by experimental research). Buying a more expensive disk may help the failure rates, but they cost more too.

Modern professional RAID cages are designed to put a lot of drives together but also to give them sufficient air flow. The air in those cages moves faster than anyone's home tower PC.

I believe Matt has SMART monitoring on the servers, so perhaps he could post the typical drive operating temperature, to give you an idea.


Google performed a study on disk failure within its datacenter. From the paper:


* Contrary to previously reported results, we found
very little correlation between failure rates and either
elevated temperature or activity levels.

ID: 789798 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 789815 - Posted: 30 Jul 2008, 11:44:55 UTC - in response to Message 789684.  

... Also, I can't resist noting that a non-Windows OS is deemed to need a "fresh install" aka "reinstall". Wow...

Reread again...

The implication is to reinstall for the purpose of upgrading to a more recent version of the OS and utilities. A reminder of the OLD AGE of the present system was given by an old (long ago debugged-out?) 'feature' of the (software raid) "mdadm" giving a false spares warning niggle...


Wot!? No viruses?!

:-p

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 789815 · Report as offensive
Profile Neil Walker
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 288
Credit: 18,101,056
RAC: 0
United Kingdom
Message 790035 - Posted: 30 Jul 2008, 20:12:44 UTC - in response to Message 789641.  

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay


Actually, they are (and always have been) half-height bays. The original floppy and hard disks took the equivalent space of 2 of those bays. Anybody with half a clue about what they are doing puts a fan in front of HDs no matter what height they claim to be.

Be lucky

Neil



ID: 790035 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 790064 - Posted: 30 Jul 2008, 21:07:02 UTC - in response to Message 789775.  

Thanks for the links Grant, exciting times ahead...
ID: 790064 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 790244 - Posted: 31 Jul 2008, 4:43:00 UTC - in response to Message 789675.  


Modern professional RAID cages are designed to put a lot of drives together but also to give them sufficient air flow. The air in those cages moves faster than anyone's home tower PC.

Which is why I don't like to hang out in my server room. It is noisy. One of my servers sounds like it is going to start hovering.

I understand the argument, but it gets pretty hard to jam LOTS of air through a tiny space sometimes.
ID: 790244 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 793048 - Posted: 5 Aug 2008, 2:58:56 UTC - in response to Message 790035.  

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay


Actually, they are (and always have been) half-height bays. The original floppy and hard disks took the equivalent space of 2 of those bays. Anybody with half a clue about what they are doing puts a fan in front of HDs no matter what height they claim to be.

For a while at the beginning of the PC there were full height bays that were not drilled for half height drives. I had a PC with 2 full height bays like this (each with one full height floppy drive installed). The hard disks (there were none for PCs when I bought my first) ended up in an external drive bay (with 2 full height bays that were drilled for half height use). I haven't seen a full height drive in almost 2 decades.


BOINC WIKI
ID: 793048 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 793049 - Posted: 5 Aug 2008, 3:03:44 UTC - in response to Message 790035.  

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay


Actually, they are (and always have been) half-height bays. The original floppy and hard disks took the equivalent space of 2 of those bays. Anybody with half a clue about what they are doing puts a fan in front of HDs no matter what height they claim to be.


Of course there were full height bays. You can't have a term like "half height" without ever having a "full height" bay. Otherwise the "half heights" would have just been called "full height" bays. They quickly went by the way side as things became smaller, but they definitely existed.
ID: 793049 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 793060 - Posted: 5 Aug 2008, 3:17:41 UTC - in response to Message 793049.  

I can't speak to Matt's experience, but when I mount a drive in a normal (non-RAID) system, I usually put the 3.5" disk in a 5.25" slot, in the middle of what used to be called a "full-high" bay


Actually, they are (and always have been) half-height bays. The original floppy and hard disks took the equivalent space of 2 of those bays. Anybody with half a clue about what they are doing puts a fan in front of HDs no matter what height they claim to be.


Of course there were full height bays. You can't have a term like "half height" without ever having a "full height" bay. Otherwise the "half heights" would have just been called "full height" bays. They quickly went by the way side as things became smaller, but they definitely existed.

My first hard drive was a half-height 5 1/4 inch formfactor 20 Meg drive. I eventually bought 2 full height 5 1/4 inch drives for a BBS I ran. 300 Meg each. They had to go in an external enclosure with a 200 watt power supply because the enclosures original 120 watt PS couldn't spin them both up at the same time. :-)
ID: 793060 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 793179 - Posted: 5 Aug 2008, 9:47:16 UTC - in response to Message 793060.  

My first hard drive was a half-height 5 1/4 inch formfactor 20 Meg drive...

And then came the "BigFoot" HDDs!

Time moves on quickly.

And it is fantastic what can be done with stable standards and well defined interfaces and open free competition... The PC turned out to be a very good 'mistake' for the world if not for IBM...

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 793179 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 793188 - Posted: 5 Aug 2008, 10:18:30 UTC - in response to Message 793048.  

For a while at the beginning of the PC there were full height bays that were not drilled for half height drives. I had a PC with 2 full height bays like this (each with one full height floppy drive installed). The hard disks (there were none for PCs when I bought my first) ended up in an external drive bay (with 2 full height bays that were drilled for half height use). I haven't seen a full height drive in almost 2 decades.

The 'full height' form factor predates the PC (as we know it, meaning IBM-compatible). My first personal computer, in about 1980, had a full-height 5.25" floppy drive in an external enclosure linked by an S100-bus ribbon cable - it cost about the same as the (48KB memory) computer it was attached to. I've still got it in my cellar (basement).
ID: 793188 · Report as offensive
William Roeder
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 69
Credit: 523,414
RAC: 0
United States
Message 793224 - Posted: 5 Aug 2008, 12:36:56 UTC - in response to Message 793188.  

My first hard drive was a half-height 5 1/4 inch formfactor 20 Meg drive.
My first hard drive was a 14 inch form factor with a whopping two megs
My first personal computer, in about 1980, had a full-height 5.25" floppy drive in an external enclosure linked by an S100-bus ribbon cable - it cost about the same as the (48KB memory) computer it was attached to. I've still got it in my cellar (basement).
And your not using it to crunch SETI?

ID: 793224 · Report as offensive
1 · 2 · Next

Message boards : Technical News : House of Fun (Jul 29 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.