Message boards :
Number crunching :
Continuously Restarting Tasks
Message board moderation
Author | Message |
---|---|
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
Hello, I'm on an iBook G4, running the BOINC Manager 5.10.10, with the processor percentage set to 80%. I have noticed that, unless the processor is set to 100%, the task under computation restarts every now and then, that being no problem most of the times. However, once in a while, the restart leads to lost work, since the % complete drops a lot, although the CPU time doesn't. I must note that this hasn't yet happened when running the rosetta worker, so I guess it's something to do with the seti worker. Testing on a MacBook right now set to 10% CPU usage, just to know if this bug (?) affects the intel build as well... This has been happening for a while now, since I am trying to cut-down fan activity. I have tried different BOINC Managers as well as worker versions, with no luck. Any ideas? |
C Send message Joined: 3 Apr 99 Posts: 240 Credit: 7,716,977 RAC: 0 |
One way to reduce your fan activity is to lift the laptop off the table so air can circulate under it and reduce the heat buildup. I keep my iBook G4 sitting on a couple of pieces of scrap wood 1 inch x 1/4 inch by 10 inches long. My fans come on every so often, but are not on continuously. My MacBookPro also sits up on a pair of sticks, but the fans stay on all the time. Without the sticks, the MBP overheats and shuts down, if left sitting flat on the table. I've heard others mention using a wire cake cooling rack to set their laptop on. C Join Team MacNN |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
In addition to C's for physical means of dealing with heat buildup in your iBook an MacBook, here are a couple of other ideas about the software environment. You may be able to mitigate the lost CPU time on restarts by setting BOINC to leave the apps in memory when suspended. Regarding behaviour when you have BOINC set to throttle down the CPU. This works by pausing the app for a period time such that the effective duty cycle of the app is whatever percentage you set for the preference. IOW's, say you had the preference set to 80%, the app will run flat out for 8 seconds then pause for 2 seconds and so forth. On Windows I have never noticed this to have an adverse effect by causing the app to have to restart from a checkpoint, but YMMV on a Mac. Another possibility which does cause a restart from a checkpoint is if the app starts missing heartbeats from the CC. Normally, this is only a problem when other normal or high priority tasks are running which preempt the CC so it can't get the heartbeat out to the app in time. There is an issue where the CC I/O can get blocked waiting for other events to happen, which leads to the same problem of lost heartbeats and app restarts, so you should be aware of that as well. So I was thinking that it might be possible the Mac's built in thermal control is throttling the machine itself in some way to address the temperature issue and that's leading to lost heartbeats between app and CC. That should help give you some ideas on troubleshooting and working around the problem. HTH, Alinator |
criton Send message Joined: 28 Feb 00 Posts: 131 Credit: 13,351,000 RAC: 2 |
i have 2 laptops running 24/7 with no over heating probs at all, and one is a core2duo on a 100% cpu on both cores. i have a notebook cooler pad under both laptops and it does the job fantastic they dont make a lot of noise eather. they only cost me £10 each including the post and packing from ebay in the uk, they was worth every penny. but you will be able to get them in the usa if thats where you are. greenfinger |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Agreed, there are good quality ones now which are inexpensive enough to make getting one for any modern Desktop Replacement Notebook a no brainer when you're going to push right them to the limits by running BOINC or other 'high loading' apps on them long term. I've even seen a little auxiliary fan that plugs into the PCCard slot to supplement the builtin one(s). With late model notebooks which typically have all connectivity needs built in the slot is usually free, so you could use the fan to add safety margin when crunching on the road without having to carry the base around with you. Alinator |
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
Apart from all the ways to face the fan issue, this should STILL be a bug! Alinator's comments are the most interesting, although 80% does not interpret to 8 seconds of work, 2 seconds of pause; it looks like a full cycle is around a second, so 80% means 0.8 seconds of work, while 0.2 of pause. However, I doubt that the client is pausing, in the same way as when you suspend computation. I will try the "leave application in memory while suspended" and see what happens. By the way: the above issue doesn't happen on my new MacBook 2.17, although it's been running for a few hours on 50% single CPU. Regarding overheating: I am in Greece, and it's around 30C over here. CPU temp. depends a LOT on ambient temperature. A difference in ambient temperature from 28 to 30 degrees C could mean a raise on CPU temperature from 50 to 58 degrees. Nevertheless, this still looks like a bug to me... |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Interesting about the way throttling seems to be working on your Mac. If that's the case, then as you say this would be a bug for the Mac version of BOINC. OTOH, I don't recall seeing complaints about it from other Mac users. I guess you need to get some feedback from other Mac'ers on that front to be sure, since I don't follow their problems as closely as for WinBoxes. However the way I described it is the way it works on Windows, which since you have a couple you can test for yourself. Alinator |
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
OTOH, I don't recall seeing complaints about it from other Mac users. I guess you need to get some feedback from other Mac'ers on that front to be sure, since I don't follow their problems as closely as for WinBoxes. Indeed, I have not seen anything familiar, either. But then again, who's running seti at less than 100% of their idle CPU cycles? Anyway, since I enabled the client to stay in memory when paused, I have seen no such messages, so maybe that's a solution to my issue. I'll let you know if anything comes along... |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Okie Dokie, glad it's helping at this point. :-) FWIW, I don't find leaving them in memory has much impact on performance when I'm doing other things in Windows, except for hard core gaming and A/V editing. I would think the Mac should even better in that regard, and you can always exit BOINC when you need all the Mac's HP for making a 'happy buck or two'. ;-) Alinator |
SMR Send message Joined: 29 Jun 99 Posts: 13 Credit: 2,648,825 RAC: 0 |
I am on a mac mini (1.25GHz G4 w/ 1gb memory), running boinc 5.10.7, osx 10.3.9, and with v5.18 it was restarting at me back to less than 2% from around 40-50%; now running an optimized app based on v5.13 it's running fine and much faster. I had my preferences set to run cpu 80% (or I set it down to 50% when it was 95 degrees F here) to keep the cpu cooler and cpu fan at lower speeds. This doesn't happen on rosetta or einstein, and didn't happen on seti until it autoupdated to v5.18, so I suspect it's some kind of bug. I do have mine set to keep apps in memory when suspended. I did manage to save (in a text file) info for one of the work units that I aborted, that says the seti 5.18 crashed. (the optimized app doesn't call graphics, so maybe this is related) Thanks everybody that keeps seti working! SMR |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Ah... I see you can post now. :-) Yes, it is not uncommon to notice the fans run faster and/or more often on one project than another. This is due to the fact that all the projects tax the hardware differently, and some of them don't result in as much heat being developed when they run, so most late model machines respond by slowing down/shutting off the fans. FWIW, I've seen 'strange' crashes like you recently observed on your Mini on normally stable machines, but that's been mostly with high performance laptops where heat dissipation and removal from the case can be a real problem. Given the small form factor of the Mini I suppose that might be what happened. Although I find that kind of surprising since Apple usually goes to extremes to ensure their machines don't have that kind of problem no matter what you are doing on them as long as you stay within the specified normal operating ambient temperature range. Alinator |
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
...now running an optimized app based on v5.13 it's running fine and much faster. I had my preferences set to run cpu 80% (or I set it down to 50% when it was 95 degrees F here) to keep the cpu cooler and cpu fan at lower speeds. I noticed the behavior noted above running the optimized v5.13, so I guess the bug is in there from the past... Keeping the worker in memory for two days now, and it hasn't happened again... |
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
Ooopss... It did it again! Seems like it took some time this time, because I have the CPU% set to 90%... Well, a few minutes ago, while the WU% was at around 50%, it dropped to 14%, so I canceled the WU to read some stderr in the result, just like SMR did on a previous post. Guess what; "Crashed executable name: seti_enhanced-ppc-v7.1-g4-nographics" is in there! The full result can be found here. Alex, please fix the "MacOS Error -5000 occured in /Users/alexkan/seti/boinc/api/mac_icon.C line 107" bug... |
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
On a relative post here, there is some notice that the crash of the science apps is not exclusive to SETI, so I guess this all looks like a BOINC Manager bug. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmm... I am having some difficulties on one of my hosts which I'm attributing to a thermal problem which are very similar to yours, but haven't tried throttling it through BOINC yet to see what happens. What's happening in my case is on a 98SE host I get a Windows fault dialog box from from the Coop's 2.2B app, but when I look at what's running in Process Explorer, both the app and the CC seem to be running normally. However, BOINCView (and I'm going to assume BOINC Manager as well, but will verify that when it happens again) looses it RPC remote connection. IOW, it looks to the remote GUI that BOINC has exited completely. When this happens, if I clear the fault by OK,ing the Windows error dialog, it will lead to an urecoverable error from the app, but if I force the CC to exit and let the app timeout on a lost heartbeat, then restart BOINC it will proceed normally with no apparent loss of crunch time (other than a normal checkpoint restart). IOW's, even though it has the appearance that everything has stopped from the outside, the app and CC are really still chugging along but apparently are not 'talking' anybody about it. Don't know for sure how this relates to what your seeing, but I just thought there was some interesting similarities here, since it would seem to be we are both looking at an issue which occurs at the thermal limits for the host. One other thing here, I wish some of the other Mac folks would take a look and verify your observation about the BOINC throttling mechanism on the Mac. At least we would know if that was a factor in your case or not. ;-) Alinator |
Odysseus Send message Joined: 26 Jul 99 Posts: 1808 Credit: 6,701,347 RAC: 6 |
Alex, please fix the "MacOS Error -5000 occured in /Users/alexkan/seti/boinc/api/mac_icon.C line 107" bug... I’m not sure that’s entirely under his control; the stock app has the same bug under BOINC v5.8.x, but it manifests as “MacOS Error -5000 occured in /Volumes/Sage/BOINC_Mac/boinc/api/mac_icon.C line 107â€Â. I haven’t seen any of those messages from BOINC v5.4.9 or earlier; something to do with the permissions “sandboxâ€Â, I imagine, the executable not being allowed to change its icon from the generic terminal-window to the green dish. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
The new 5.21 app is now official here for PPC Macs. Does it help with the restart problem? Joe |
frederick corse Send message Joined: 25 Mar 01 Posts: 3 Credit: 1,100,016 RAC: 0 |
I have two 5.2.1 ,13my00ab14824.20034.667534.3.191, 13 my00ab14824.66734.3.194 . At startup they constantly call for a restart. there is no state file in the slots location.With predictor at home a state file is required,there is one in the 5.1.8 .version. |
petrusbroder Send message Joined: 2 Dec 01 Posts: 9 Credit: 55,701,905 RAC: 1 |
I have 4 Macs running. All of them G4 - two with dual processors, two with singles. All OS X 10.4.10. Three of the Macs have BOINC v. 5.8.17, one has 5.8.9. I have crunched seti using the seti@home enhanced 5.18 application without problems on all these 4 Macs. Then it switched to seti@home enhanced 5.21 application and on all my Macs the applications either crashed or restarted many times. I have aborted all WUs. I got two sets of error messages: On three of the Macs: Client error: The stderr out was: <core_client_version>5.8.17</core_client_version> on an other Mac the stderr out was: <core_client_version>5.4.9</core_client_version> Is this a problem with my comps or some problem with the application. I have looked around, and some other crunchers have reported similar problems before --- but I did not quite understand what to do to avoid the problems ... |
Thanar Send message Joined: 14 May 99 Posts: 50 Credit: 3,223,553 RAC: 0 |
The new 5.21 app is now official here for PPC Macs. Does it help with the restart problem?Joe Actually, looks like the stock 5.23 client works fine, no restarting tasks for over a week now with processor usage % sent to 50. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.