Weirderer (Sep 07 2007)

Message boards : Technical News : Weirderer (Sep 07 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 635635 - Posted: 7 Sep 2007, 18:16:36 UTC

Last night the assimilators stopped inserting work into the science database. We discovered that one of the indexes on the result table was corrupt - whether or not this was caused by the recent drive failures, or if this had anything to do with the assimilator problem was anybody's guess.

I started off the result index checker last night and quickly after that a THIRD drive failed on thumper in as many days. This is getting ridiculous, especially as there are no apparent signs why the drives are failing, and we're running low on spares.

This morning Bob started rebuilding the corrupt index and once that is finish I'll start the assimilators (hopefully they will be happy) and catch up on the major backlog. Maybe then I'll start the splitters, but given how our science database might tank any second we might hold off on that. In short: there may be no new work until Monday.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 635635 · Report as offensive
Profile Jon (nanoreid)
Avatar

Send message
Joined: 16 Aug 07
Posts: 643
Credit: 583,870
RAC: 0
United States
Message 635639 - Posted: 7 Sep 2007, 18:19:38 UTC - in response to Message 635635.  

Last night the assimilators stopped inserting work into the science database. We discovered that one of the indexes on the result table was corrupt - whether or not this was caused by the recent drive failures, or if this had anything to do with the assimilator problem was anybody's guess.

I started off the result index checker last night and quickly after that a THIRD drive failed on thumper in as many days. This is getting ridiculous, especially as there are no apparent signs why the drives are failing, and we're running low on spares.

This morning Bob started rebuilding the corrupt index and once that is finish I'll start the assimilators (hopefully they will be happy) and catch up on the major backlog. Maybe then I'll start the splitters, but given how our science database might tank any second we might hold off on that. In short: there may be no new work until Monday.

- Matt


Sounds like some of my days. Every machine goes down at once.
Hopefully the cosmos is not trying to reverse the charges.
Moderation in all things.
ID: 635639 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 635659 - Posted: 7 Sep 2007, 18:30:55 UTC

Quick read......
The drives are not failing.......the controller is. Drives just do not fail that often. Look for the true source of the problem.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 635659 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 635663 - Posted: 7 Sep 2007, 18:35:40 UTC - in response to Message 635635.  

Yes, PLEASE replace that drive controller ASAP. It seems to be getting worse and the server is on borrowed time waiting to corrupt all the drives.
ID: 635663 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 635666 - Posted: 7 Sep 2007, 18:40:43 UTC - in response to Message 635659.  

Quick read......
The drives are not failing.......the controller is. Drives just do not fail that often. Look for the true source of the problem.

Or maybe even a power supply. We had several drives auger in in rapid succession due to a dying PS doing rude things with the voltages to the drives. Either way, I recommend drinking 3 beers and calling me in the morning.

Dr. Scarecrow
ID: 635666 · Report as offensive
RedmoonWHO

Send message
Joined: 11 Mar 02
Posts: 1
Credit: 314,327
RAC: 0
United States
Message 635704 - Posted: 7 Sep 2007, 19:14:37 UTC

It's funny, every time I see that I'm not getting any new work I always assume it's my computer. Hopefully you can bag this problem soon. You'll probably find that the drives that you thought were failing are perfectly fine once you find the true cause of the problem.
ID: 635704 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 635794 - Posted: 7 Sep 2007, 20:07:18 UTC
Last modified: 7 Sep 2007, 20:07:46 UTC

And I do not mean to be rude guys. I have even had RAM problems make the OS tell me that my hard drive had failed. Even tho the drive was only a few weeks old. So please don't take my post as an insult. I truly find it hard to accept that 3 hard drives would fail in a couple of days time.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 635794 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 635804 - Posted: 7 Sep 2007, 20:09:57 UTC

Put an oscilloscope on the power leads and look for transients. A simple digital voltmeter is not good enough.
Go beg the EE’s for one.
ID: 635804 · Report as offensive
Profile TerryG
Avatar

Send message
Joined: 11 Mar 01
Posts: 16
Credit: 15,351,703
RAC: 37
United Kingdom
Message 635898 - Posted: 7 Sep 2007, 20:56:20 UTC

Strange how Seti went down with disk problems about the same time Rosetta@Home did! Spooky.
ID: 635898 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 635927 - Posted: 7 Sep 2007, 21:16:59 UTC

Oh.. we are VERY aware that these drive failures may be spurious. Remember that the original thumper failed because of the main drive controller board.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 635927 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 635935 - Posted: 7 Sep 2007, 21:22:51 UTC


Funny - had a wrecked controller this week too...

Not really the "week of storage"

mic.


ID: 635935 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 635939 - Posted: 7 Sep 2007, 21:25:52 UTC - in response to Message 635927.  

Oh.. we are VERY aware that these drive failures may be spurious. Remember that the original thumper failed because of the main drive controller board.

- Matt


OK Matt, sorry to try to point out the obvious. It's just that your previous posts hadn't mentioned anything other than the drives had failed. Carry on, my friend.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 635939 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 635953 - Posted: 7 Sep 2007, 21:45:09 UTC - in response to Message 635939.  

No problem - it's a constant battle to decide how much detail is too much or too little in every post I make. Too much: boring, redundant, confusing. Too little: unclear, vague, misleading.

- Matt

OK Matt, sorry to try to point out the obvious. It's just that your previous posts hadn't mentioned anything other than the drives had failed. Carry on, my friend.


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 635953 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 635955 - Posted: 7 Sep 2007, 21:46:55 UTC - in response to Message 635953.  

No problem - it's a constant battle to decide how much detail is too much or too little in every post I make. Too much: boring, redundant, confusing. Too little: unclear, vague, misleading.

- Matt

OK Matt, sorry to try to point out the obvious. It's just that your previous posts hadn't mentioned anything other than the drives had failed. Carry on, my friend.



LOL...I kinda like the boring, redundant, confusing ones.

Regards,
Mark.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 635955 · Report as offensive
Eirik
Volunteer tester
Avatar

Send message
Joined: 25 Mar 01
Posts: 45
Credit: 2,173,371
RAC: 0
Norway
Message 635958 - Posted: 7 Sep 2007, 21:49:39 UTC
Last modified: 7 Sep 2007, 21:49:50 UTC

Will there be any more WU's this weekend, or is that still too early to tell?
ID: 635958 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 635967 - Posted: 7 Sep 2007, 22:04:09 UTC

No problem - it's a constant battle to decide how much detail is too much or too little in every post I make. Too much: boring, redundant, confusing. Too little: unclear, vague, misleading.


The more info the better!
Gives us (at least me) the feeling of beeing part of it...


mic.


ID: 635967 · Report as offensive
Profile Careface

Send message
Joined: 6 Jun 03
Posts: 128
Credit: 16,561,684
RAC: 0
New Zealand
Message 635975 - Posted: 7 Sep 2007, 22:21:44 UTC - in response to Message 635955.  



LOL...I kinda like the boring, redundant, confusing ones.




Lol, me too.. sometimes I even get bored enough to go back and read tech news from several months ago and go "wow, what utter crap the SETI crew have to put up with so often"

Careface*
ID: 635975 · Report as offensive
Profile John Fluth

Send message
Joined: 6 Oct 99
Posts: 22
Credit: 164,030,648
RAC: 153
United States
Message 636078 - Posted: 8 Sep 2007, 1:37:01 UTC - in response to Message 635967.  

[quote]No problem - it's a constant battle to decide how much detail is too much or too little in every post I make. Too much: boring, redundant, confusing. Too little: unclear, vague, misleading.


Matt - Thanks for doing such a great job! Parts fail - things happen - I've been there - and appreciate your dedication.
ID: 636078 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 636131 - Posted: 8 Sep 2007, 2:30:46 UTC - in response to Message 635898.  
Last modified: 8 Sep 2007, 2:42:44 UTC

Strange how Seti went down with disk problems about the same time Rosetta@Home did! Spooky.


@Terry

No, Rosetta went down a day before there was any indication of a problem with SETI...probably just a coincidence, as (IIRC) Rosetta does not have any hardware models in common with SETI. (my two main projects at the moment...) Had to re-start Einstein because one 'puter (that was on S@H and R@H) ran out of work.
.

Hello, from Albany, CA!...
ID: 636131 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 636136 - Posted: 8 Sep 2007, 2:38:20 UTC - in response to Message 635639.  
Last modified: 8 Sep 2007, 2:44:10 UTC



Sounds like some of my days. Every machine goes down at once.


I hope you meant "at least once"! (insertion in bold) ;-)

There have been days that a computer in my care (I was a mainframe operator at the time...) has died several times a day for more than two months, on a random schedule! (The tech staff/customer engineers finally tracked that bug to a bad wiring harness... we got a new [same model] computer on warrenty...)
.

Hello, from Albany, CA!...
ID: 636136 · Report as offensive
1 · 2 · 3 · Next

Message boards : Technical News : Weirderer (Sep 07 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.