Panic Mode On (100) Server Problems?

Message boards : Number crunching : Panic Mode On (100) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1731984 - Posted: 5 Oct 2015, 5:43:09 UTC - in response to Message 1731772.  

Please quit complaining about splitters until you are EMPTY!

Why?
I've always considered it better to point out a problem when it is first noticed (and hopefully still minor), than wait until it becomes critical before mentioning/doing something about it.

As for complaining- all I'm doing is pointing out that there is an issue. Complaining is a whole different ball game.


And the fact is the splitters are having significant issues, more so now than when I made my earlier posts.
Grant
Darwin NT
ID: 1731984 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1731990 - Posted: 5 Oct 2015, 6:16:53 UTC

It would appear we are not out of the woods on the router issue. This time on the website side. No connection for the last 2 hours...

"Sour Grapes make a bitter Whine." <(0)>
ID: 1731990 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1732037 - Posted: 5 Oct 2015, 12:22:34 UTC - in response to Message 1731961.  

Patience, Patience, Patience.

SETI@Home is operating on a Shoestring Budget.

BTW: Would like to see a 'Green Star' next to every Poster's name.

+ alot

http://3.bp.blogspot.com/_D_Z-D2tzi14/S8TTPQCPA6I/AAAAAAAACwA/ZHZH-Bi8OmI/s400/ALOT2.png
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1732037 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1732120 - Posted: 5 Oct 2015, 20:17:40 UTC - in response to Message 1731984.  

Please quit complaining about splitters until you are EMPTY!

Why?
I've always considered it better to point out a problem when it is first noticed (and hopefully still minor), than wait until it becomes critical before mentioning/doing something about it.

As for complaining- all I'm doing is pointing out that there is an issue. Complaining is a whole different ball game.


And the fact is the splitters are having significant issues, more so now than when I made my earlier posts.

Precisely. Much like when I noted that we were nearing 2^32 for task IDs and an inquiry was made three weeks ahead of the estimated time that task would happen... and it turned out it was a problem that would have had to be dealt with. Sure, it was still a little bumpy for a few days, but it could have been much worse had we waited until everything just... broke.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1732120 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1732126 - Posted: 5 Oct 2015, 20:30:14 UTC - in response to Message 1732120.  

Please quit complaining about splitters until you are EMPTY!

Why?
I've always considered it better to point out a problem when it is first noticed (and hopefully still minor), than wait until it becomes critical before mentioning/doing something about it.

As for complaining- all I'm doing is pointing out that there is an issue. Complaining is a whole different ball game.


And the fact is the splitters are having significant issues, more so now than when I made my earlier posts.

Precisely. Much like when I noted that we were nearing 2^32 for task IDs and an inquiry was made three weeks ahead of the estimated time that task would happen... and it turned out it was a problem that would have had to be dealt with. Sure, it was still a little bumpy for a few days, but it could have been much worse had we waited until everything just... broke.

The trick is to distinguish between the routine (even if somewhat annoying) and the novel or unique.
ID: 1732126 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1732131 - Posted: 5 Oct 2015, 20:59:15 UTC - in response to Message 1732126.  

Exactly Richard, if the site is down for 5 hours of maintenance, don't expect a RTS for 12-24 hours ... there is a lot of buckets (caches) to fill.
ID: 1732131 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30638
Credit: 53,134,872
RAC: 32
United States
Message 1732168 - Posted: 6 Oct 2015, 0:36:14 UTC

Matt's note on the home page about the router is gone. Assume that indicates it is fixed.
ID: 1732168 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1732457 - Posted: 7 Oct 2015, 4:18:38 UTC

Well that's an averted crisis for me. Lost power for 8 minutes this morning.. UPS did its job, no problems there. Shut down, then power came back on sooner than I was anticipating, booted everything back up, it was all fine for hours.

Started playing a game with a friend via VPN and two minutes into the game.. the 120dB alarm on my RAID card went off... one of the disks in the raid-5 was labeled as "failed."

Unplugged that drive, plugged it into the motherboard and pulled up HD Tune and looked at the SMART data..

Reallocated Sector Count: 12 (warning)
Reallocated Event Count: 1 (warning)
Interface CRC Error Count: 1 (attention)

Started a surface scan, and after 2h 12m, it was done with no errors found. Wrote zeroes to the first 2200mb, then plugged it back into the RAID card and it was picked-up and immediately began rebuilding the array.

Best I can figure.. controller sent the control commands to the disk and the disk decided it needed 12 sectors for that write, but the controller punted the drive before the payload/data was sent, which accounts for one event with 12 sectors, and an Interface CRC Error.

Random punting is a known foible of these Areca cards. The older 1.46 firmware did it quite frequently.. I was having a disk punted 3-4 times a year, but after several years, Areca still didn't know exactly why that happens, but they released 1.47 and said that they hope that reduces the number of random punts. I flashed the firmware almost two years ago and I've only had two punts since then, so... I guess that's an improvement.

But I learned something whilst looking at the SMART data.. these four 500gb RE2 drives I'm still using are ancient.. and they are quite robust soldiers. I bought them in early 2007 at a time when the reviews on Newegg suggested there was about an 80% chance of DOA, or failure within 30 days. I had one legitimate failure in 2009 and got an RMA through WD, and they sent me an RE3 replacement.

The remaining three RE2s are still going, and they have 70,019 power-on hours. I was running the math on that.. and 70,080 hours is exactly 8 years. I think these drives have done exceptionally well considering all the factors.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1732457 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1732516 - Posted: 7 Oct 2015, 12:07:58 UTC

or the sectors have been reallocated

i throw a drive out if it gives me any trouble what so ever ....learned my lesson years ago

since then i 've been using WD enterprise they never let me down so far in 5 years not a single one has died or thrown a smart error
I came down with a bad case of i don't give a crap
ID: 1732516 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1732519 - Posted: 7 Oct 2015, 12:23:49 UTC - in response to Message 1732516.  

since then i 've been using WD enterprise they never let me down so far in 5 years not a single one has died or thrown a smart error


Not sure about this, BUT: Enterprise drives have different firmware, firmware that (I suspect) doesn't do SMART error processing for things like relocated sectors. I've come across EDs that have no SMART errors but seem to misbehave anyway.

Can anyone comment on this who is knowledgeable about the subject?
ID: 1732519 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1732522 - Posted: 7 Oct 2015, 12:34:50 UTC
Last modified: 7 Oct 2015, 12:45:18 UTC

Enterprise do smart and yes they have different firmware and are build different too

on the no smart error but misbehaving thing it can be the mechanics or the lil mainboard itself that is damaged

i ran into quiet a few cache errors with seagates years ago where the drive would randomly just disapear turns out the cache was bad on it

and then anything from motor burnouts to high flies to loss of air cushion due to clogged breather holes and servo errors

oh and then you have your head crashes

there was a bunch of issues with the 1st gen raptors that would annihilate the heads when the head park was released after startup

had a couple 15k scsi drives where the servo became detached from the chassis after the screws came out that held it to the chassis

EDIT:
And then you have the seagate 1000.5 (i think that was the number) that would go into service mode due to the fact that some sw engineer at seagate forgot to remove somedthing from the firmware and if certain criterias where met on startup it would simply go into service mode and disapear ...only rma would fix it unless you had a spi programmer to put it out of that mode and then flashed new firmware
I came down with a bad case of i don't give a crap
ID: 1732522 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1732523 - Posted: 7 Oct 2015, 12:44:40 UTC - in response to Message 1732522.  

Having a look round the tech specification sheets out of curiosity, I came across this curious quote from a review:

The only other variable I will touch on, is the advertised 5 drive limit on the Western Digital Red drive. There is alot of uncertainty when it comes to this 5 drive limit, but the consensus appears to be that since the WD Red drives lack rotational vibration sensors, vibration resonance can cause issues if you attempt to load up more than 5 drives in the same enclosure. Where we haven’t tested this ourselves, we have seen reports ranging from individuals running more than 5 of these drives without issue, and conversely some reporting being completely incapable of getting a 6th drive to configure. With that said, if you’re looking for a >5 drive deployment, you may want to take this limitation into consideration.

Make of it what you will. From http://www.techwarelabs.com/western-digital-4tb-drive-roundup/
ID: 1732523 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1732525 - Posted: 7 Oct 2015, 12:49:48 UTC

heh remember these crap green drives

these drives would constantly fall out of raid because of the time it took for them to run a selftest ....i still think that was done on purpose to prevent ppl and companies to use these drives in raids since they where so dirt cheap

i try to prevent drive vibration beeing translated into the case by using rubber on the screws
I came down with a bad case of i don't give a crap
ID: 1732525 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1732545 - Posted: 7 Oct 2015, 14:27:30 UTC - in response to Message 1732525.  

heh remember these crap green drives

these drives would constantly fall out of raid because of the time it took for them to run a selftest ....i still think that was done on purpose to prevent ppl and companies to use these drives in raids since they where so dirt cheap

i try to prevent drive vibration beeing translated into the case by using rubber on the screws


I have a single WD Green drive, it holds my games. I boot from a SSD.

ID: 1732545 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1732547 - Posted: 7 Oct 2015, 14:35:12 UTC
Last modified: 7 Oct 2015, 14:36:56 UTC

I have faithfully used WD Blue, Green, and Black for many years in lots of refurbished machines(usually HP Workstations) both as extra storage and OS drives without any failures. I don't do RAID but rather incremental backup. Systems have failed but never due to a WD drive in my experience. I also now favor OS on an SSD with all program files on the 'D' drive.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1732547 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1732567 - Posted: 7 Oct 2015, 16:43:53 UTC

I have used WD Black, Hitachi, and others in a RAID 1 configuration, and none of them lasted more than a couple of weeks. I couldn't afford to keep adding drives, so I stopped doing RAID. Before that I had a couple of WD Black drives in a RAID 0, but got scared in case one failed, then I would loses everything. I had that configuration for about a month, and it was fast. I just didn't want to risk a total loss, even though the files were backed up elsewhere.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1732567 · Report as offensive
uNi73

Send message
Joined: 28 Aug 99
Posts: 2
Credit: 31,795
RAC: 0
Germany
Message 1732580 - Posted: 7 Oct 2015, 17:08:56 UTC
Last modified: 7 Oct 2015, 17:12:05 UTC

After 11 days, I just had contact with the Server and got 2 new files *yay*

edit: dowload started one second after the rosetta upload was done. random?`
ID: 1732580 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1732595 - Posted: 7 Oct 2015, 17:48:12 UTC
Last modified: 7 Oct 2015, 17:49:00 UTC

I only put enterprise drives nowadays it pays off in the long run

my video server runs a raid 50 and if a drive should fail one of these days i just pull it out and replace it and be done with it and not worry about loosing data

Pretty hard to backup 57Tb worth of video and audio

EDIT:
well it takes a long long time ...i'm uploading all i got to amazon cloud drive atm since they where so nice to let me have unlimited storage hrhrhr
I came down with a bad case of i don't give a crap
ID: 1732595 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1732651 - Posted: 7 Oct 2015, 21:15:47 UTC

My NAS has two WDC WD40EZRX 4TB drives. Despite various bugs in the NAS software, firmware and problems with humidity the drives are still humming along quite nicely and quietly. My biggest problem is that I only have another 2928.33 GB free. ;-)
ID: 1732651 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1732660 - Posted: 7 Oct 2015, 21:35:24 UTC

I may have gotten a bit too eager to sign-off on the drive being fine. Maybe.

It took 26 minutes longer to rebuild the array than all the previous times before. Actually.. way back in 2007-9 when I was rebuilding somewhat frequently, it took 2h 17m to do it. November 2014--the last time I had a disk get punted--it took 2h 31m. Last night, it took 2h 57m.

Once that was done, I began doing a back-up of the really important data to BD-RE DL (50gb) by using ImgBurn to write files to an ISO image, because for whatever reason, when I do files->disc, at the start of the burn operation when the read buffer tries to allocate, I get a memAlloc() failure saying that the system doesn't have enough memory for the 512mb read cache when there is 12gb of memory free. So I do files->image, then image->disc.

Anyway, the array handled head-thrashing pretty well for reading 140,000 small files from all over the place in the filesystem for a good 15 minutes to write the 50gb image. Burnt that, and when that was done, I made a more-inclusive backup to BD-RE XL (100gb) using the same method.

When it got to about 95% of creating the image, the disk-reading hung for about 30 seconds and then the RAID alarm went off again and disk 1 was marked as "failed." The rest of the reading continued just fine. Before that disk was punted/failed again, I noticed during both of these heavy read events that it was pausing and hesitating somewhat frequently, and when it would come across a 1-7gb file that it could read sequentially, instead of reading at 175MB/sec, it was barely doing 40MB/sec, so it was hesitating/stuttering.. and being generally slow.

I think even though a full surface scan yielded no problems.. the drive is actually failing. So... I'm shuffling data now. Relocating 460gb of data from the spare 500gb RE3 I've got to the free space on the 1TB Blacks so I can put this RE3 into the array as disk 1, and then I can do some extensive testing on the questionable RE2.



I've only had two legitimate disk failures with WD drives since I started using them in 2002. Even when I get disks from a "bad batch" period of time, they still end up working properly for many years. I want to upgrade to bigger drives, but I have to say... the reviews on Newegg for anything >1TB regardless of brand does not instill any confidence at all. The 2 and 3TB Blacks don't have stellar reviews, but the 4TB Black does. And I'm not going with Red or Purple. Blues are nice for what they are.. they are effectively cheaper Blacks, but they only go up to 1TB. I really do like the RE drives, and the 5-year warranty is nice (I've had to use that warranty only once).

I want to upgrade to 4x4TB REs.. but... ow, that is expensive.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1732660 · Report as offensive
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

Message boards : Number crunching : Panic Mode On (100) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.