Message boards :
Number crunching :
Panic Mode On (100) Server Problems?
Message board moderation
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Please quit complaining about splitters until you are EMPTY! Why? I've always considered it better to point out a problem when it is first noticed (and hopefully still minor), than wait until it becomes critical before mentioning/doing something about it. As for complaining- all I'm doing is pointing out that there is an issue. Complaining is a whole different ball game. And the fact is the splitters are having significant issues, more so now than when I made my earlier posts. Grant Darwin NT |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
It would appear we are not out of the woods on the router issue. This time on the website side. No connection for the last 2 hours... "Sour Grapes make a bitter Whine." <(0)> |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Patience, Patience, Patience. http://3.bp.blogspot.com/_D_Z-D2tzi14/S8TTPQCPA6I/AAAAAAAACwA/ZHZH-Bi8OmI/s400/ALOT2.png SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Please quit complaining about splitters until you are EMPTY! Precisely. Much like when I noted that we were nearing 2^32 for task IDs and an inquiry was made three weeks ahead of the estimated time that task would happen... and it turned out it was a problem that would have had to be dealt with. Sure, it was still a little bumpy for a few days, but it could have been much worse had we waited until everything just... broke. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Please quit complaining about splitters until you are EMPTY! The trick is to distinguish between the routine (even if somewhat annoying) and the novel or unique. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Exactly Richard, if the site is down for 5 hours of maintenance, don't expect a RTS for 12-24 hours ... there is a lot of buckets (caches) to fill. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30638 Credit: 53,134,872 RAC: 32 |
Matt's note on the home page about the router is gone. Assume that indicates it is fixed. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Well that's an averted crisis for me. Lost power for 8 minutes this morning.. UPS did its job, no problems there. Shut down, then power came back on sooner than I was anticipating, booted everything back up, it was all fine for hours. Started playing a game with a friend via VPN and two minutes into the game.. the 120dB alarm on my RAID card went off... one of the disks in the raid-5 was labeled as "failed." Unplugged that drive, plugged it into the motherboard and pulled up HD Tune and looked at the SMART data.. Reallocated Sector Count: 12 (warning) Reallocated Event Count: 1 (warning) Interface CRC Error Count: 1 (attention) Started a surface scan, and after 2h 12m, it was done with no errors found. Wrote zeroes to the first 2200mb, then plugged it back into the RAID card and it was picked-up and immediately began rebuilding the array. Best I can figure.. controller sent the control commands to the disk and the disk decided it needed 12 sectors for that write, but the controller punted the drive before the payload/data was sent, which accounts for one event with 12 sectors, and an Interface CRC Error. Random punting is a known foible of these Areca cards. The older 1.46 firmware did it quite frequently.. I was having a disk punted 3-4 times a year, but after several years, Areca still didn't know exactly why that happens, but they released 1.47 and said that they hope that reduces the number of random punts. I flashed the firmware almost two years ago and I've only had two punts since then, so... I guess that's an improvement. But I learned something whilst looking at the SMART data.. these four 500gb RE2 drives I'm still using are ancient.. and they are quite robust soldiers. I bought them in early 2007 at a time when the reviews on Newegg suggested there was about an 80% chance of DOA, or failure within 30 days. I had one legitimate failure in 2009 and got an RMA through WD, and they sent me an RE3 replacement. The remaining three RE2s are still going, and they have 70,019 power-on hours. I was running the math on that.. and 70,080 hours is exactly 8 years. I think these drives have done exceptionally well considering all the factors. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
or the sectors have been reallocated i throw a drive out if it gives me any trouble what so ever ....learned my lesson years ago since then i 've been using WD enterprise they never let me down so far in 5 years not a single one has died or thrown a smart error I came down with a bad case of i don't give a crap |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
since then i 've been using WD enterprise they never let me down so far in 5 years not a single one has died or thrown a smart error Not sure about this, BUT: Enterprise drives have different firmware, firmware that (I suspect) doesn't do SMART error processing for things like relocated sectors. I've come across EDs that have no SMART errors but seem to misbehave anyway. Can anyone comment on this who is knowledgeable about the subject? |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
Enterprise do smart and yes they have different firmware and are build different too on the no smart error but misbehaving thing it can be the mechanics or the lil mainboard itself that is damaged i ran into quiet a few cache errors with seagates years ago where the drive would randomly just disapear turns out the cache was bad on it and then anything from motor burnouts to high flies to loss of air cushion due to clogged breather holes and servo errors oh and then you have your head crashes there was a bunch of issues with the 1st gen raptors that would annihilate the heads when the head park was released after startup had a couple 15k scsi drives where the servo became detached from the chassis after the screws came out that held it to the chassis EDIT: And then you have the seagate 1000.5 (i think that was the number) that would go into service mode due to the fact that some sw engineer at seagate forgot to remove somedthing from the firmware and if certain criterias where met on startup it would simply go into service mode and disapear ...only rma would fix it unless you had a spi programmer to put it out of that mode and then flashed new firmware I came down with a bad case of i don't give a crap |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Having a look round the tech specification sheets out of curiosity, I came across this curious quote from a review: The only other variable I will touch on, is the advertised 5 drive limit on the Western Digital Red drive. There is alot of uncertainty when it comes to this 5 drive limit, but the consensus appears to be that since the WD Red drives lack rotational vibration sensors, vibration resonance can cause issues if you attempt to load up more than 5 drives in the same enclosure. Where we haven’t tested this ourselves, we have seen reports ranging from individuals running more than 5 of these drives without issue, and conversely some reporting being completely incapable of getting a 6th drive to configure. With that said, if you’re looking for a >5 drive deployment, you may want to take this limitation into consideration. Make of it what you will. From http://www.techwarelabs.com/western-digital-4tb-drive-roundup/ |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
heh remember these crap green drives these drives would constantly fall out of raid because of the time it took for them to run a selftest ....i still think that was done on purpose to prevent ppl and companies to use these drives in raids since they where so dirt cheap i try to prevent drive vibration beeing translated into the case by using rubber on the screws I came down with a bad case of i don't give a crap |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
heh remember these crap green drives I have a single WD Green drive, it holds my games. I boot from a SSD. |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
I have faithfully used WD Blue, Green, and Black for many years in lots of refurbished machines(usually HP Workstations) both as extra storage and OS drives without any failures. I don't do RAID but rather incremental backup. Systems have failed but never due to a WD drive in my experience. I also now favor OS on an SSD with all program files on the 'D' drive. "Sour Grapes make a bitter Whine." <(0)> |
SciManStev Send message Joined: 20 Jun 99 Posts: 6652 Credit: 121,090,076 RAC: 0 |
I have used WD Black, Hitachi, and others in a RAID 1 configuration, and none of them lasted more than a couple of weeks. I couldn't afford to keep adding drives, so I stopped doing RAID. Before that I had a couple of WD Black drives in a RAID 0, but got scared in case one failed, then I would loses everything. I had that configuration for about a month, and it was fast. I just didn't want to risk a total loss, even though the files were backed up elsewhere. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
uNi73 Send message Joined: 28 Aug 99 Posts: 2 Credit: 31,795 RAC: 0 |
After 11 days, I just had contact with the Server and got 2 new files *yay* edit: dowload started one second after the rosetta upload was done. random?` |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
I only put enterprise drives nowadays it pays off in the long run my video server runs a raid 50 and if a drive should fail one of these days i just pull it out and replace it and be done with it and not worry about loosing data Pretty hard to backup 57Tb worth of video and audio EDIT: well it takes a long long time ...i'm uploading all i got to amazon cloud drive atm since they where so nice to let me have unlimited storage hrhrhr I came down with a bad case of i don't give a crap |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
My NAS has two WDC WD40EZRX 4TB drives. Despite various bugs in the NAS software, firmware and problems with humidity the drives are still humming along quite nicely and quietly. My biggest problem is that I only have another 2928.33 GB free. ;-) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I may have gotten a bit too eager to sign-off on the drive being fine. Maybe. It took 26 minutes longer to rebuild the array than all the previous times before. Actually.. way back in 2007-9 when I was rebuilding somewhat frequently, it took 2h 17m to do it. November 2014--the last time I had a disk get punted--it took 2h 31m. Last night, it took 2h 57m. Once that was done, I began doing a back-up of the really important data to BD-RE DL (50gb) by using ImgBurn to write files to an ISO image, because for whatever reason, when I do files->disc, at the start of the burn operation when the read buffer tries to allocate, I get a memAlloc() failure saying that the system doesn't have enough memory for the 512mb read cache when there is 12gb of memory free. So I do files->image, then image->disc. Anyway, the array handled head-thrashing pretty well for reading 140,000 small files from all over the place in the filesystem for a good 15 minutes to write the 50gb image. Burnt that, and when that was done, I made a more-inclusive backup to BD-RE XL (100gb) using the same method. When it got to about 95% of creating the image, the disk-reading hung for about 30 seconds and then the RAID alarm went off again and disk 1 was marked as "failed." The rest of the reading continued just fine. Before that disk was punted/failed again, I noticed during both of these heavy read events that it was pausing and hesitating somewhat frequently, and when it would come across a 1-7gb file that it could read sequentially, instead of reading at 175MB/sec, it was barely doing 40MB/sec, so it was hesitating/stuttering.. and being generally slow. I think even though a full surface scan yielded no problems.. the drive is actually failing. So... I'm shuffling data now. Relocating 460gb of data from the spare 500gb RE3 I've got to the free space on the 1TB Blacks so I can put this RE3 into the array as disk 1, and then I can do some extensive testing on the questionable RE2. I've only had two legitimate disk failures with WD drives since I started using them in 2002. Even when I get disks from a "bad batch" period of time, they still end up working properly for many years. I want to upgrade to bigger drives, but I have to say... the reviews on Newegg for anything >1TB regardless of brand does not instill any confidence at all. The 2 and 3TB Blacks don't have stellar reviews, but the 4TB Black does. And I'm not going with Red or Purple. Blues are nice for what they are.. they are effectively cheaper Blacks, but they only go up to 1TB. I really do like the RE drives, and the 5-year warranty is nice (I've had to use that warranty only once). I want to upgrade to 4x4TB REs.. but... ow, that is expensive. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.