Message boards :
Number crunching :
Problem with mac clients?
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
I've been seeing errors similar to the following on my PowerMac (currently running 10.4.2): Mon Nov 28 08:32:31 2005||request_reschedule_cpus: files downloaded Mon Nov 28 08:32:31 2005|SETI@home|Can't create shared memory: system shmget Mon Nov 28 08:32:31 2005|SETI@home|Unrecoverable error for result 18ja04ab.24696.17985.261062.239_0 (Couldn't start or resume: -144) Mon Nov 28 08:32:31 2005||request_reschedule_cpus: start failed Mon Nov 28 08:32:31 2005|SETI@home|resume_or_start(): unexpected process state 7 Mon Nov 28 08:32:32 2005|SETI@home|Computation for result 18ja04ab.24696.17985.261062.239_0 finished In addition I think that particular client has been returning bogus results and is now restricted in the number that Seti will accept from it. Is anyone aware of this error and do you know how to fix it? I just updated the client today (Mac version 5.2.8). |
![]() ![]() Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 ![]() ![]() |
This is a strange error, and it's coming "up" from the OS level... In addition I think that particular client has been returning bogus results and is now restricted in the number that Seti will accept from it. With your computers hidden, I really can't tell anything at all. Can't look at the results, the error messages, nothing. The WIKI lists the error, but doesn't have any additional information on it. If you can show your computers (and point me at the right host #), or post the 4 stdout and stderr files from Library/Application Support/BOINC Data somewhere that I can get to them, I'd be happy to help investigate. Otherwise, the best I could tell you would be to run Software Update to make sure your OS is current, and to download BOINC again and let it install over the top of what's there, or maybe even trash the entire thing and reinstall from scratch. See you on the reef! ;-) |
Dotsch ![]() Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0 ![]() |
[quote] Your system has a problem to create shared memory. Could be that is a problem in general to create/allocate shared memory, or your shared memory parameters are to low. Can you try to set the shared memory parameters in the kernel higher. Btw. like Bill has written, the output of the sterr.txt, will be realy helpfull. |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
Okay - the computers are now showing; the computer in question is cerebus.lib.usf.edu (the Mac one). If it makes a difference for anyone paying attention, the computer is running the server version of OS X. Any idea how I would change the shared memory parameters in the kernel? last few lines of stderrdae.txt: 2005-11-29 04:14:31 [SETI@home] Unrecoverable error for result 17ja04aa.28517.32096.684642.174_2 (Couldn't start or resume: -144) 2005-11-29 04:14:31 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:24:39 [SETI@home] Can't create shared memory: system shmget 2005-11-29 04:24:39 [SETI@home] Unrecoverable error for result 18ja04ab.24696.26961.736090.234_3 (Couldn't start or resume: -144) 2005-11-29 04:24:39 [SETI@home] Can't create shared memory: system shmget 2005-11-29 04:24:39 [SETI@home] Unrecoverable error for result 17mr05ab.17159.32081.804812.121_2 (Couldn't start or resume: -144) 2005-11-29 04:24:39 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:24:39 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:24:42 [SETI@home] Can't create shared memory: system shmget 2005-11-29 04:24:42 [SETI@home] Unrecoverable error for result 17mr05ab.17159.32081.804812.117_1 (Couldn't start or resume: -144) 2005-11-29 04:24:42 [SETI@home] Can't create shared memory: system shmget 2005-11-29 04:24:42 [SETI@home] Unrecoverable error for result 17mr05ab.17159.32081.804812.124_0 (Couldn't start or resume: -144) 2005-11-29 04:24:42 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:24:42 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:34:50 [SETI@home] Message from server: No work sent 2005-11-29 04:34:50 [SETI@home] Message from server: (reached daily quota of 16 results) 2005-11-29 04:34:50 [SETI@home] No work from project last few lines of stdoutdae.txt: 2005-11-29 04:24:40 [SETI@home] Finished download of 17mr05ab.17159.32081.804812.1242005-11-29 04:24:40 [SETI@home] Throughput 418173 bytes/sec2005-11-29 04:24:40 [SETI@home] Computation for result 18ja04ab.24696.26961.736090.234_3 finished2005-11-29 04:24:41 [---] request_reschedule_cpus: files downloaded2005-11-29 04:24:41 [---] request_reschedule_cpus: files downloaded2005-11-29 04:24:41 [SETI@home] Computation for result 17mr05ab.17159.32081.804812.121_2 finished2005-11-29 04:24:42 [SETI@home] Can't create shared memory: system shmget2005-11-29 04:24:42 [SETI@home] Unrecoverable error for result 17mr05ab.17159.32081.804812.117_1 (Couldn't start or resume: -144) 2005-11-29 04:24:42 [---] request_reschedule_cpus: start failed 2005-11-29 04:24:42 [SETI@home] Can't create shared memory: system shmget 2005-11-29 04:24:42 [SETI@home] Unrecoverable error for result 17mr05ab.17159.32081.804812.124_0 (Couldn't start or resume: -144) 2005-11-29 04:24:42 [---] request_reschedule_cpus: start failed 2005-11-29 04:24:42 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:24:42 [SETI@home] resume_or_start(): unexpected process state 7 2005-11-29 04:24:43 [SETI@home] Computation for result 17mr05ab.17159.32081.804812.117_1 finished 2005-11-29 04:24:44 [SETI@home] Computation for result 17mr05ab.17159.32081.804812.124_0 finished 2005-11-29 04:34:45 [SETI@home] Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 2005-11-29 04:34:45 [SETI@home] Reason: To fetch work 2005-11-29 04:34:45 [SETI@home] Requesting 172800 seconds of new work, and reporting 4 results2005-11-29 04:34:50 [SETI@home] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded 2005-11-29 04:34:50 [SETI@home] Message from server: No work sent 2005-11-29 04:34:50 [SETI@home] Message from server: (reached daily quota of 16 results)2005-11-29 04:34:50 [SETI@home] No work from project 2005-11-29 05:43:23 [---] Suspending computation and network activity - user is active I'm guessing there's an unforeseen issue with the fact that this is the server version of 10.4.2 ... |
![]() ![]() Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 ![]() ![]() |
Okay - the computers are now showing; the computer in question is cerebus.lib.usf.edu (the Mac one). We can't see the names, but "the Mac one" helps. :-) I'm guessing there's an unforeseen issue with the fact that this is the server version of 10.4.2 ... That is my only guess as well. OS X Server is a (slightly) different animal. A bit more "unix". Nothing is even getting started here. I don't know how to increase shared memory parameters in the kernal, but it should be the same or almost the same as on Linux - someone? |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
Well I found out how (and it took a while - where these things are in Tiger server is different from apparently every other version of OS X - it's in /etc/rc) and increased my memory. I don't see the error message (I attached and detached to get some work units to work on) but now when the system goes to pause a work unit (for whatever reason) I see: Tue Nov 29 06:57:08 2005|SETI@home|Restarting result 21mr05aa.20593.497.286082.137_2 using setiathome version 418 Tue Nov 29 06:58:49 2005||Suspending computation and network activity - user is active Tue Nov 29 06:58:49 2005|SETI@home|Pausing result 21mr05aa.20593.497.286082.137_2 (removed from memory) Tue Nov 29 06:58:50 2005||Couldn't destroy shared memory: system shmctl Tue Nov 29 06:58:50 2005||request_reschedule_cpus: process exited And this worries me. During this time period, the err files reported: stderrdae.txt: 2005-11-29 06:53:38 [---] Missing account key 2005-11-29 06:56:07 [---] Couldn't destroy shared memory: system shmctl 2005-11-29 06:58:50 [---] Couldn't destroy shared memory: system shmctl stdoutdae.txt: 2005-11-29 06:56:06 [SETI@home] Pausing result 21mr05aa.20593.497.286082.137_2 (removed from memory) 2005-11-29 06:56:07 [---] Couldn't destroy shared memory: system shmctl 2005-11-29 06:56:07 [---] request_reschedule_cpus: process exited 2005-11-29 06:56:08 [---] Running CPU benchmarks 2005-11-29 06:57:07 [---] Benchmark results: 2005-11-29 06:57:07 [---] Number of CPUs: 2 2005-11-29 06:57:07 [---] 1512 double precision MIPS (Whetstone) per CPU 2005-11-29 06:57:07 [---] 4664 integer MIPS (Dhrystone) per CPU 2005-11-29 06:57:07 [---] Finished CPU benchmarks 2005-11-29 06:57:08 [---] Resuming computation and network activity 2005-11-29 06:57:08 [---] request_reschedule_cpus: Resuming activities 2005-11-29 06:57:08 [SETI@home] Restarting result 21mr05aa.20593.497.286082.137_2 using setiathome version 418 2005-11-29 06:58:49 [---] Suspending computation and network activity - user is active 2005-11-29 06:58:49 [SETI@home] Pausing result 21mr05aa.20593.497.286082.137_2 (removed from memory) 2005-11-29 06:58:50 [---] Couldn't destroy shared memory: system shmctl 2005-11-29 06:58:50 [---] request_reschedule_cpus: process exited Hopefully someone will be able to tell me either that this isn't something to worry about (but I'm not that young of a computer geek, so I know all about the evils of memory allocation - and the failures to let go), or that a programmer for BOINC/SETI has seen this and it *will* get fixed in a future release. It's probably important to note that both the BOINC client and the SETI process run as me, not root. I think this is by BOINC's design, but this may also explain why sometimes memory is not being released (or "destroyed", as the preferred terminology seems to indicate). To narrow the field more for the curious, the computer in question is: 1821738 Power Macintosh PowerMac7,3 Darwin 8.3.0 Ignore the PowerBook - it hasn't been powered on in a while (long story). |
Dotsch ![]() Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0 ![]() |
Well I found out how (and it took a while - where these things are in Tiger server is different from apparently every other version of OS X - it's in /etc/rc) and increased my memory. I don't see the error message (I attached and detached to get some work units to work on) but now when the system goes to pause a work unit (for whatever reason) I see: Where exactly have you configured the shared memory. Can you please describe it. It could be of interest for the other Mac OS Users... Can you please post a list of your kernel parameters.
Looks strange. Could it be, that the shared memory functions differ in Mac OS X and MacOS X Server ?
I will mail your problem tomorow to the boinc_dev mailinglist.
There should be no difference in shared memory allocation/destroy between root and a normal user. Which boinc_client and version do you use ? - The offical berkeley version ? |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
There's a file called /etc/rc - I found via Google where some folks using PostgreSQL were configuring their systems and copied their configuration: sysctl -w kern.sysv.shmmax=335544320 kern.sysv.shmmin=1 kern.sysv.shmmni=32 kern.sysv.shmseg=8 kern.sysv.shmall=327680 # sysctl -w kern.sysv.shmmax=4194304 kern.sysv.shmmin=1 kern.sysv.shmmni=32 kern.sysv.shmseg=8 kern.sysv.shmall=1024 Look for something like the line that's commented out (it's the default). The line above it is what the PostgreSQL folks were using. I wasn't 100% sure (as I did not investigate further - I just wanted to get it running) how shmall and shmmax were related. I got the impression that there might be some math involved there but, as I said, I haven't looked into it. Realize that you need to edit the file as root (or find some other way to edit it that gets you "root" or admin privs). |
![]() Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 ![]() |
Cerebus, can you send me a ziped copy of the TXT files in the BOINC directory? When I get working again, I can use your log as an example. Also, could you add acopy of your solution to the e-mail? p.d.buck@comcast.net For most messages, if you look in the wiki you can find a non-technical explanation ... |
![]() ![]() Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 ![]() ![]() |
sysctl -w kern.sysv.shmmax=335544320 kern.sysv.shmmin=1 kern.sysv.shmmni=32 kern.sysv.shmseg=8 kern.sysv.shmall=327680 I'm definitely not a Unix expert, but to me it appears that you increased these values by a lot more than I would feel comfortable with. Originals of 4M and 1K, you went to 335M and 327K... that last one scares me the most, I have to wonder if it shouldn't be 32K instead, dropping that trailing zero. A "max" value changing a lot - okay. But a "min" value generally shouldn't need to be changed by a factor of over 300! |
Dotsch ![]() Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0 ![]() |
Do you mean the computer http://setiathome.berkeley.edu/show_host_detail.php?hostid=1821738 ? - If I have seen, you use now 5.2.13. Do you get the errors again ? Regards Lars |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
The system in question has 4 GB of RAM - going up to 335M is nothing. And I increased the "all" value, not the min - the min didn't change. I'm not sure how the "all" value figures in; you'd assume it would be equal to or more than a max but then again, I haven't seen any doco about these settings yet, nor have I bothered to do research. As far as errors, since increasing the ram I haven't gotten the "create" errors and SETI does packets normally. I still get the: Fri Dec 2 03:27:50 2005||Couldn't destroy shared memory: system shmctl errors though whenever a packet is interrupted/finished for whatever reason. |
![]() ![]() Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 ![]() ![]() |
These are the values from both of my readily available Mac, a Mini with 512MB and an iBook with 384MB... kern.sysv.shmmax: 4194304 kern.sysv.shmmin: 1 kern.sysv.shmmni: 32 kern.sysv.shmseg: 8 kern.sysv.shmall: 1024 I don't know why these would be the same on all Macs regardless of RAM, that doesn't make much sense. I would think with 4GB, your defaults would be bigger. The only thing that would worry me about the "couldn't destroy" messages would be if this was flagging a memory leak. I would keep a close eye on it to see if performance degrades, VM file size grows, disk fills, etc... |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
These are the values from both of my readily available Mac, a Mini with 512MB and an iBook with 384MB... snipped - those are the defaults on mine as well. I don't know why these would be the same on all Macs regardless of RAM, that doesn't make much sense. I would think with 4GB, your defaults would be bigger. Yeah but that would require the installer to do math and we all know how hard that is (/sarcasm) The only thing that would worry me about the "couldn't destroy" messages would be if this was flagging a memory leak. I would keep a close eye on it to see if performance degrades, VM file size grows, disk fills, etc... That's my concern too, and I'm watching it - Again, I'm wondering if this is a "permissions on OS X server" issue. Or if there's just something wonky with my install. (I haven't messed with it that much - honest! It's practically straight out of the box with updates!) |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
The only thing that would worry me about the "couldn't destroy" messages would be if this was flagging a memory leak. I would keep a close eye on it to see if performance degrades, VM file size grows, disk fills, etc... Anecdotally this does appear to result in a memory leak - there came a point where the machine wasn't running packets because it couldn't create shared memory. Killing and restarting the boinc manager didn't fix the problem; only a reboot did. |
![]() ![]() Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 ![]() ![]() |
Suddenly have one more person with this same error - BOINC message thread. I'm hoping perhaps if the two of you compare notes... |
Nathan Herring Send message Joined: 15 Mar 03 Posts: 2 Credit: 606,879 RAC: 0 ![]() |
I have seen this problem as well (I'm the alternate poster that was just mentioned), but even when SETI@Home isn't running, but rather the Einstein project. I'm running 10.4.3, non-server, on a 2x2GHz G5 with 1GB RAM. Apparently, other folk with that same setup aren't having problems, so I'm wondering if there's an application that is either leaking shared memory (so BOINC can't allocote more) or frustrating BOINC's ability to free shared memory (causing a leak, which will eventually make BOINC unable to create shared memory). I am worried that it might be Microsoft Office's implementation of shared memory -- are you running any Office application or Messenger on your Mac? |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
I am worried that it might be Microsoft Office's implementation of shared memory -- are you running any Office application or Messenger on your Mac? Not actively ... I'll start digging through my login items to see if it's one of them ... |
![]() Send message Joined: 15 Apr 99 Posts: 12 Credit: 883,485 RAC: 0 ![]() |
I am worried that it might be Microsoft Office's implementation of shared memory -- are you running any Office application or Messenger on your Mac? As office is installed, the database daemon and the auto update daemon from MS are running. The only truly unusual thing that's running would be the LCCDaemon for my Logitech MX1000 mouse ... Everything else in my login items are standard items. |
![]() Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 ![]() |
As office is installed, the database daemon and the auto update daemon from MS are running. The only truly unusual thing that's running would be the LCCDaemon for my Logitech MX1000 mouse ... Everything else in my login items are standard items. I don't use Office on my PowerMac, and do have "Tiger" installed. As for a mouse, I am using a spare USB Microsoft mouse of all things ... :) You could try to not start up the Office background processes (or shoot them after booting) to see if that changes things ... |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.