Kernel error on a Raspberry Pi Rev2 - what to do now

Questions and Answers : Unix/Linux : Kernel error on a Raspberry Pi Rev2 - what to do now
Message board moderation

To post messages, you must log in.

AuthorMessage
RonSC

Send message
Joined: 23 Dec 16
Posts: 4
Credit: 6,145
RAC: 0
Canada
Message 1842332 - Posted: 15 Jan 2017, 5:12:14 UTC

Hello,
After setting up boinc on a Raspberry Pi Rev2, I started its first SaH work unit. Soon after, I observed a kernel error, as follows:
Task:
name: 05mr09ac.13047.14796.11.38.100_1
WU name: 05mr09ac.13047.14796.11.38.100
...
Message from syslogd@RPi-Rev2 at Jan 12 17:56:14 ...
kernel: [ 3247.965774] {this is the line prefix of each following line}
Internal error: Oops - undefined instruction: 0 [#1] ARM
Processing setiathome_8.02 (pid: 894, stack limit = 0xd8804188)
Stack: (0xd8805e67 to 0xd8806000)
{followed by the stack dump}

I decided to increase the RAM allocation to 224MB, and restarted the RPi, but again the same kernel error occurred.
Lastly, I removed a Gertboard plugin card to reduce power demand. After booting up the RPi, the same kernel error happened again.
The 3 kernel error occurrences revealed a pattern: they all happened a few minutes after a new work unit is started, and also after an RPi restart.
The RPi is running a freshly installed and upgraded Raspbian Jessie version & kernel.
Interestingly, this RPi-Rev2 has 512MB RAM and is having kernel errors, while my Raspbian-based RPi-Rev1 with 256MB RAM seems to be running error-free.
Oddly, the avg cpu loads given by /proc/loadavg are 0.0 (1 min), 0.0 (5 min), 0.0 (15 min) !! while get_tasks shows the work unit as executing. With about cpu 420,000secs left (4.8 days) I think this task should be killed.
My opinion is to do a --prroject URL detach (I login to my headless RPi systems via ssh), but that means the problem remains unsolved :(. Do you need more information?
I'll leave it running for now until I get your advice.
Cheers, Ron
ID: 1842332 · Report as offensive
klammerj
Volunteer tester

Send message
Joined: 3 Jan 08
Posts: 4
Credit: 500,895
RAC: 0
Austria
Message 1842341 - Posted: 15 Jan 2017, 7:21:43 UTC - in response to Message 1842332.  

It said undefined instruction... maybe they got the binaries mixed up.. or wrong compilation flags...
It's unlikely to be the amount of ram installed...
ID: 1842341 · Report as offensive
RonSC

Send message
Joined: 23 Dec 16
Posts: 4
Credit: 6,145
RAC: 0
Canada
Message 1842477 - Posted: 15 Jan 2017, 20:59:11 UTC - in response to Message 1842341.  

Thanks for the diagnosis klammerj.

ITEM 1:
I see that the "current cpu time" is stalled, so I assume that it will just sit there until the task deadline. I wish to somehow kill the offending task and start another. Is the following approach best?:

$ boinccmd --host RPi-Rev2 --project URL reset

the Boinccmd Tool Manual describes reset as: delete current work and get more

ITEM 2:
Would I be correct to say that if I do a standard system upgrade (apt-get), which might update boinc itself, this would cause problems for the currently executing task?

ITEM 3:
If I need to do some system maintenance I would like to pause project work. Which of the following methods would work best?
$ boinccmd --host RPi-Rev2 --project URL suspend
$ boinccmd --host RPi-Rev2 --project URL resume

or should I do:
$ boinccmd --host RPi-Rev2 --project URL nomorework
$ boinccmd --host RPi-Rev2 --project URL allowmorework

and what is the difference? - the Manual doesn't explain.
Thanks, Ron
ID: 1842477 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1842482 - Posted: 15 Jan 2017, 21:19:53 UTC - in response to Message 1842477.  

Suspend and Resume is just pausing and resuming the project, all tasks are affected.
Nomorework and Allowmorework is telling the project not to download any more tasks, and telling the project to again download new work.

But I have checked your computers, the Raspberry Pi v1 is the single core, I think? It hasn't finished any tasks yet.
The Raspberry V2 is the quad core? It seems to run the work fine, there are no errors. Work is being returned and validated. So there's no need for rash decisions here. And certainly not for resetting the project. All that that does is throw away the work in cache. It may reset the application to crunch with, but then will download the same one afterwards.
ID: 1842482 · Report as offensive
RonSC

Send message
Joined: 23 Dec 16
Posts: 4
Credit: 6,145
RAC: 0
Canada
Message 1845521 - Posted: 30 Jan 2017, 21:41:48 UTC

Hello,
I don't know if the info below is useful, in general, for the project; please let me know if it is not useful.

My id is RonSC, user id is 10419846, computer name is RPi-Rev1 (single core), computer id is 8179761. The task (name is: 13fe09aa.27929.4162.10.37.195_1) is hung - ie. current CPU time is not incrementing, and cpu load is 0.02 (15 min avg).
To avoid wasting time waiting until the deadline, I plan to
reset
the task and resume processing as soon as I have posted this note.
Hopefully, this info might help to resolve the problem with this task.
Cheers, Ron
ID: 1845521 · Report as offensive
RonSC

Send message
Joined: 23 Dec 16
Posts: 4
Credit: 6,145
RAC: 0
Canada
Message 1849423 - Posted: 18 Feb 2017, 1:52:50 UTC

Since my last post, my computer RPi-Rev2 has had 2 more kernel crashes of the same type on consecutive tasks: undefined instruction: 0 [#1] ARM.
klammerj thought the first such error was probably caused by improper compilation options, and that was a month ago and its plain the problem has not been fixed.
I've come to the conclusion that I might as well retire RPi-Rev2 from Seti work, especially since it is a very slow processor, and maybe the Seti project shares this view - ie. is it worth the effort required to support.
Cheers, Ron
ID: 1849423 · Report as offensive
Profile Karl Roos
Avatar

Send message
Joined: 19 Mar 01
Posts: 36
Credit: 206,258,788
RAC: 0
United States
Message 1850669 - Posted: 23 Feb 2017, 0:28:37 UTC

I'm running boinc & seti@home on a Raspberry Pi. Works great but I can only get 2 of the 4 cores processing WU's. On my Android tablet there was an option in the app to run 2 cores or 4, I don't see the equivalent in the Pi interface. I have computing preferences set to not limit cores or processor %, but it still only runs 2 wu's on 2 of the cores.
ID: 1850669 · Report as offensive

Questions and Answers : Unix/Linux : Kernel error on a Raspberry Pi Rev2 - what to do now


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.