Horizontal Hold (Jun 14 2007)

Message boards : Technical News : Horizontal Hold (Jun 14 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 588576 - Posted: 18 Jun 2007, 13:31:24 UTC - in response to Message 586947.  

One theory is the bad CPU screwed up the previous kernel, which might explain why it suddenly had problems when it was fine for weeks before that. Then again.. how does a bad CPU permanently screw up a kernel image?

Quite possible - some years ago our Win server had one buggy (non-ECC) memory module. After a sudden crash, a plenty of files in C:\\WINDOWS\\system32\\ (and somewhere else too) got some errorneous characters (more exactly, few bits) in their names and the system refused to boot.

The solution was to locate and replace the bad memory module, install the same system on some other machine, compare the directory trees and rename the broken file names - everything was then running fine.

Peter
ID: 588576 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 588728 - Posted: 18 Jun 2007, 18:23:27 UTC - in response to Message 588576.  

One theory is the bad CPU screwed up the previous kernel, which might explain why it suddenly had problems when it was fine for weeks before that. Then again.. how does a bad CPU permanently screw up a kernel image?

Quite possible - some years ago our Win server had one buggy (non-ECC) memory module. After a sudden crash, a plenty of files in C:\\WINDOWS\\system32\\ (and somewhere else too) got some errorneous characters (more exactly, few bits) in their names and the system refused to boot.

The solution was to locate and replace the bad memory module, install the same system on some other machine, compare the directory trees and rename the broken file names - everything was then running fine.

Peter


[off topic]
Welcome back to the Boards Peter . . . ;O
[on topic]
BOINC Wiki . . .

Science Status Page . . .
ID: 588728 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 588981 - Posted: 19 Jun 2007, 7:16:23 UTC - in response to Message 588573.  

Are Scarecrow's cowboys capable of handling both types of endians?


Tell the big ones to line up, and the little ones to bunch up.... we'll whoop 'em all.

How's it going, Peter?
ID: 588981 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 589004 - Posted: 19 Jun 2007, 8:51:45 UTC - in response to Message 588981.  

Are Scarecrow's cowboys capable of handling both types of endians?

Tell the big ones to line up, and the little ones to bunch up.... we'll whoop 'em all.

I see you have them under controll.

How's it going, Peter?

Yippeeeeeee!
Less time recently for virtual life, just scratching some projects' boards. But still enough interest for the area.

Peter
ID: 589004 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 589071 - Posted: 19 Jun 2007, 13:14:01 UTC - in response to Message 586947.  

[snip]
Dell replaced the bad CPU in isaac, which fixed one problem, but we were still having unexplained crashes when using the latest xen kernel. However a new kernel came out and we upgraded to that this morning and so far so good. One theory is the bad CPU screwed up the previous kernel, which might explain why it suddenly had problems when it was fine for weeks before that. Then again.. how does a bad CPU permanently screw up a kernel image?

[snip]
- Matt


One possibility is that the "bad" CPU overwrote part of the kernal in memory, in a "swapping" module (it comes into memory and gets re-written to disk, repeatedly) and the error propigated (like, maybe, it was a disk drive driver routine...)

I've seen this happen, in my experience. (in much older OS's - IBM's DOS/MS, OSMVT, etc.) Finding the cause was always frustrating, and quite slow...
.

Hello, from Albany, CA!...
ID: 589071 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 589170 - Posted: 19 Jun 2007, 16:07:05 UTC - in response to Message 589071.  

[snip]
Dell replaced the bad CPU in isaac, which fixed one problem, but we were still having unexplained crashes when using the latest xen kernel. However a new kernel came out and we upgraded to that this morning and so far so good. One theory is the bad CPU screwed up the previous kernel, which might explain why it suddenly had problems when it was fine for weeks before that. Then again.. how does a bad CPU permanently screw up a kernel image?

[snip]
- Matt


One possibility is that the "bad" CPU overwrote part of the kernal in memory, in a "swapping" module (it comes into memory and gets re-written to disk, repeatedly) and the error propigated (like, maybe, it was a disk drive driver routine...)

I've seen this happen, in my experience. (in much older OS's - IBM's DOS/MS, OSMVT, etc.) Finding the cause was always frustrating, and quite slow...


Also remember that software raid is the norm there, (a conscious decision.)
ID: 589170 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 589180 - Posted: 19 Jun 2007, 16:36:22 UTC - in response to Message 589170.  

Also remember that software raid is the norm there, (a conscious decision.)


In this case, hardware RAID. There's no "norm" as much as what works best with what we got.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 589180 · Report as offensive
Previous · 1 · 2

Message boards : Technical News : Horizontal Hold (Jun 14 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.