Unrecoverable errors

疑难解答 : Unix/Linux : Unrecoverable errors
留言板合理

To post messages, you must log in.

作者消息
Bryn
Avatar

发送消息
已加入:2 Jun 01
贴子:85
积分:925,923
近期平均积分:26
United Kingdom
消息 589045 - 发表于:19 Jun 2007, 12:25:12 UTC

Debian probably has memtest installed or at least available from the distro media. It's often automatically installed and available from the initial startup boot menu.
Or, if the Debian CD/DVD will boot into a 'rescue' mode (usually just a basic shell prompt) then it should be available from there too.

To err is human; to moo, bovine.
ID: 589045 · 举报违规帖子
Profile Crunch3r
志愿者测试人员
Avatar

发送消息
已加入:15 Apr 99
贴子:1546
积分:3,438,823
近期平均积分:0
Germany
消息 588128 - 发表于:17 Jun 2007, 14:43:44 UTC - 回复消息 588119.  
最近的修改日期:17 Jun 2007, 14:43:59 UTC

Lmsensors is running. One of the reasons I stopped doing seti@home was that during the summer, things get hot. Right now, the CPU is a 52 C. Not awful. I have run intensive fpu audio work during real summer heat up to 56 C with no problems. If you believe 52 C is too high, I will lay off seti@home until the fall.

In terms of a failing RAM, could you suggest a test for this. Again, the system seems to run everything else without problems. No such errors appear in my logs.

A suggestion I have seen, referring to sporadic disk errors (click-clacks in WD disks with bad reads and system freezeups, for example) that are not logged by smart, is the power supply. I got rid of the errors by changing cables and disconnecting one CD drive. However my +5 volt line is consistently low at 4.7v. All others are close to spec. Probably should replace it anyway ($$).

The kernel I am running is a realtime-preemptive patched kernel. I could run seti under an ID that does not have these privileges if RT is a problem (cannot imagine such).


52 °C doesn't seem to high to me.
For memory testing have at look here ---> http://www.memtest.org/

regarding the kernel, i don't think that it's the reason for the crashes.
btw. you could try an optimized seti application ---> http://lunatics.at just to check if the problems occur with that one too. (and it's way faster than the stock application ;) )







Join BOINC United now!
ID: 588128 · 举报违规帖子
David Baron

发送消息
已加入:25 May 04
贴子:2
积分:22,129
近期平均积分:0
Israel
消息 588119 - 发表于:17 Jun 2007, 12:59:49 UTC - 回复消息 587907.  

Lmsensors is running. One of the reasons I stopped doing seti@home was that during the summer, things get hot. Right now, the CPU is a 52 C. Not awful. I have run intensive fpu audio work during real summer heat up to 56 C with no problems. If you believe 52 C is too high, I will lay off seti@home until the fall.

In terms of a failing RAM, could you suggest a test for this. Again, the system seems to run everything else without problems. No such errors appear in my logs.

A suggestion I have seen, referring to sporadic disk errors (click-clacks in WD disks with bad reads and system freezeups, for example) that are not logged by smart, is the power supply. I got rid of the errors by changing cables and disconnecting one CD drive. However my +5 volt line is consistently low at 4.7v. All others are close to spec. Probably should replace it anyway ($$).

The kernel I am running is a realtime-preemptive patched kernel. I could run seti under an ID that does not have these privileges if RT is a problem (cannot imagine such).
ID: 588119 · 举报违规帖子
Profile Crunch3r
志愿者测试人员
Avatar

发送消息
已加入:15 Apr 99
贴子:1546
积分:3,438,823
近期平均积分:0
Germany
消息 587907 - 发表于:16 Jun 2007, 22:20:03 UTC - 回复消息 587811.  
最近的修改日期:16 Jun 2007, 23:00:36 UTC

I do not think I have completed one work unit since restarting seti@home on my system. Sometime, part way through, the work stops, an "unreoverable error" message is logged, existing results are uploaded (hopefully all those luscious guassians, pointed spikes and nicely born triplets have not been lost in the fray).

I am running with boinc 5.8.17 (off Debian "unstable" Sid) on my Debian Linux box, an older Pentium-III.


Hi,

I've looked at your results and it seems that your PIII has some issues.

first: FPU failure in pulse_find
second: St9bad_alloc

Now those errors are unrelated so you it seems that you have basically 2 major problems.

First the fpu failure could be related to an overheating cpu.

The second one looks like an memory issue (bad_alloc ---> memory allocation failed), could be a failing ram module.

You should check your system for overheating (try lm_sensors to check cpu temperature) and try running a memtest to check if your memory is ok.







Join BOINC United now!
ID: 587907 · 举报违规帖子
David Baron

发送消息
已加入:25 May 04
贴子:2
积分:22,129
近期平均积分:0
Israel
消息 587811 - 发表于:16 Jun 2007, 19:59:48 UTC

I do not think I have completed one work unit since restarting seti@home on my system. Sometime, part way through, the work stops, an "unreoverable error" message is logged, existing results are uploaded (hopefully all those luscious guassians, pointed spikes and nicely born triplets have not been lost in the fray).

I am running with boinc 5.8.17 (off Debian "unstable" Sid) on my Debian Linux box, an older Pentium-III.
ID: 587811 · 举报违规帖子

疑难解答 : Unix/Linux : Unrecoverable errors


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.