Unrecoverable errors

Questions and Answers : Unix/Linux : Unrecoverable errors
Message board moderation

To post messages, you must log in.

AuthorMessage
David Baron

Send message
Joined: 25 May 04
Posts: 2
Credit: 22,129
RAC: 0
Israel
Message 587811 - Posted: 16 Jun 2007, 19:59:48 UTC

I do not think I have completed one work unit since restarting seti@home on my system. Sometime, part way through, the work stops, an "unreoverable error" message is logged, existing results are uploaded (hopefully all those luscious guassians, pointed spikes and nicely born triplets have not been lost in the fray).

I am running with boinc 5.8.17 (off Debian "unstable" Sid) on my Debian Linux box, an older Pentium-III.
ID: 587811 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 587907 - Posted: 16 Jun 2007, 22:20:03 UTC - in response to Message 587811.  
Last modified: 16 Jun 2007, 23:00:36 UTC

I do not think I have completed one work unit since restarting seti@home on my system. Sometime, part way through, the work stops, an "unreoverable error" message is logged, existing results are uploaded (hopefully all those luscious guassians, pointed spikes and nicely born triplets have not been lost in the fray).

I am running with boinc 5.8.17 (off Debian "unstable" Sid) on my Debian Linux box, an older Pentium-III.


Hi,

I've looked at your results and it seems that your PIII has some issues.

first: FPU failure in pulse_find
second: St9bad_alloc

Now those errors are unrelated so you it seems that you have basically 2 major problems.

First the fpu failure could be related to an overheating cpu.

The second one looks like an memory issue (bad_alloc ---> memory allocation failed), could be a failing ram module.

You should check your system for overheating (try lm_sensors to check cpu temperature) and try running a memtest to check if your memory is ok.







Join BOINC United now!
ID: 587907 · Report as offensive
David Baron

Send message
Joined: 25 May 04
Posts: 2
Credit: 22,129
RAC: 0
Israel
Message 588119 - Posted: 17 Jun 2007, 12:59:49 UTC - in response to Message 587907.  

Lmsensors is running. One of the reasons I stopped doing seti@home was that during the summer, things get hot. Right now, the CPU is a 52 C. Not awful. I have run intensive fpu audio work during real summer heat up to 56 C with no problems. If you believe 52 C is too high, I will lay off seti@home until the fall.

In terms of a failing RAM, could you suggest a test for this. Again, the system seems to run everything else without problems. No such errors appear in my logs.

A suggestion I have seen, referring to sporadic disk errors (click-clacks in WD disks with bad reads and system freezeups, for example) that are not logged by smart, is the power supply. I got rid of the errors by changing cables and disconnecting one CD drive. However my +5 volt line is consistently low at 4.7v. All others are close to spec. Probably should replace it anyway ($$).

The kernel I am running is a realtime-preemptive patched kernel. I could run seti under an ID that does not have these privileges if RT is a problem (cannot imagine such).
ID: 588119 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 588128 - Posted: 17 Jun 2007, 14:43:44 UTC - in response to Message 588119.  
Last modified: 17 Jun 2007, 14:43:59 UTC

Lmsensors is running. One of the reasons I stopped doing seti@home was that during the summer, things get hot. Right now, the CPU is a 52 C. Not awful. I have run intensive fpu audio work during real summer heat up to 56 C with no problems. If you believe 52 C is too high, I will lay off seti@home until the fall.

In terms of a failing RAM, could you suggest a test for this. Again, the system seems to run everything else without problems. No such errors appear in my logs.

A suggestion I have seen, referring to sporadic disk errors (click-clacks in WD disks with bad reads and system freezeups, for example) that are not logged by smart, is the power supply. I got rid of the errors by changing cables and disconnecting one CD drive. However my +5 volt line is consistently low at 4.7v. All others are close to spec. Probably should replace it anyway ($$).

The kernel I am running is a realtime-preemptive patched kernel. I could run seti under an ID that does not have these privileges if RT is a problem (cannot imagine such).


52 °C doesn't seem to high to me.
For memory testing have at look here ---> http://www.memtest.org/

regarding the kernel, i don't think that it's the reason for the crashes.
btw. you could try an optimized seti application ---> http://lunatics.at just to check if the problems occur with that one too. (and it's way faster than the stock application ;) )







Join BOINC United now!
ID: 588128 · Report as offensive
Bryn
Avatar

Send message
Joined: 2 Jun 01
Posts: 85
Credit: 925,923
RAC: 26
United Kingdom
Message 589045 - Posted: 19 Jun 2007, 12:25:12 UTC

Debian probably has memtest installed or at least available from the distro media. It's often automatically installed and available from the initial startup boot menu.
Or, if the Debian CD/DVD will boot into a 'rescue' mode (usually just a basic shell prompt) then it should be available from there too.

To err is human; to moo, bovine.
ID: 589045 · Report as offensive

Questions and Answers : Unix/Linux : Unrecoverable errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.