Continuously Restarting Tasks

Message boards : Number crunching : Continuously Restarting Tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 602769 - Posted: 13 Jul 2007, 12:17:00 UTC
Last modified: 13 Jul 2007, 12:32:47 UTC

Hello,

I'm on an iBook G4, running the BOINC Manager 5.10.10, with the processor percentage set to 80%. I have noticed that, unless the processor is set to 100%, the task under computation restarts every now and then, that being no problem most of the times. However, once in a while, the restart leads to lost work, since the % complete drops a lot, although the CPU time doesn't.

I must note that this hasn't yet happened when running the rosetta worker, so I guess it's something to do with the seti worker. Testing on a MacBook right now set to 10% CPU usage, just to know if this bug (?) affects the intel build as well...

This has been happening for a while now, since I am trying to cut-down fan activity. I have tried different BOINC Managers as well as worker versions, with no luck. Any ideas?
ID: 602769 · Report as offensive
C

Send message
Joined: 3 Apr 99
Posts: 240
Credit: 7,716,977
RAC: 0
United States
Message 602786 - Posted: 13 Jul 2007, 13:53:14 UTC

One way to reduce your fan activity is to lift the laptop off the table so air can circulate under it and reduce the heat buildup. I keep my iBook G4 sitting on a couple of pieces of scrap wood 1 inch x 1/4 inch by 10 inches long. My fans come on every so often, but are not on continuously. My MacBookPro also sits up on a pair of sticks, but the fans stay on all the time. Without the sticks, the MBP overheats and shuts down, if left sitting flat on the table. I've heard others mention using a wire cake cooling rack to set their laptop on.

C

Join Team MacNN
ID: 602786 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 602818 - Posted: 13 Jul 2007, 16:37:20 UTC
Last modified: 13 Jul 2007, 16:38:44 UTC

In addition to C's for physical means of dealing with heat buildup in your iBook an MacBook, here are a couple of other ideas about the software environment.

You may be able to mitigate the lost CPU time on restarts by setting BOINC to leave the apps in memory when suspended.

Regarding behaviour when you have BOINC set to throttle down the CPU. This works by pausing the app for a period time such that the effective duty cycle of the app is whatever percentage you set for the preference. IOW's, say you had the preference set to 80%, the app will run flat out for 8 seconds then pause for 2 seconds and so forth.

On Windows I have never noticed this to have an adverse effect by causing the app to have to restart from a checkpoint, but YMMV on a Mac.

Another possibility which does cause a restart from a checkpoint is if the app starts missing heartbeats from the CC. Normally, this is only a problem when other normal or high priority tasks are running which preempt the CC so it can't get the heartbeat out to the app in time. There is an issue where the CC I/O can get blocked waiting for other events to happen, which leads to the same problem of lost heartbeats and app restarts, so you should be aware of that as well.

So I was thinking that it might be possible the Mac's built in thermal control is throttling the machine itself in some way to address the temperature issue and that's leading to lost heartbeats between app and CC.

That should help give you some ideas on troubleshooting and working around the problem.

HTH,

Alinator
ID: 602818 · Report as offensive
Profile criton
Avatar

Send message
Joined: 28 Feb 00
Posts: 131
Credit: 13,351,000
RAC: 2
United Kingdom
Message 602831 - Posted: 13 Jul 2007, 16:59:27 UTC

i have 2 laptops running 24/7 with no over heating probs at all, and one is a core2duo on a 100% cpu on both cores. i have a notebook cooler pad under both laptops and it does the job fantastic they dont make a lot of noise eather. they only cost me £10 each including the post and packing from ebay in the uk, they was worth every penny. but you will be able to get them in the usa if thats where you are.

greenfinger
ID: 602831 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 602836 - Posted: 13 Jul 2007, 17:16:54 UTC
Last modified: 13 Jul 2007, 17:17:57 UTC

Agreed, there are good quality ones now which are inexpensive enough to make getting one for any modern Desktop Replacement Notebook a no brainer when you're going to push right them to the limits by running BOINC or other 'high loading' apps on them long term.

I've even seen a little auxiliary fan that plugs into the PCCard slot to supplement the builtin one(s). With late model notebooks which typically have all connectivity needs built in the slot is usually free, so you could use the fan to add safety margin when crunching on the road without having to carry the base around with you.

Alinator
ID: 602836 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 602853 - Posted: 13 Jul 2007, 18:25:03 UTC

Apart from all the ways to face the fan issue, this should STILL be a bug! Alinator's comments are the most interesting, although 80% does not interpret to 8 seconds of work, 2 seconds of pause; it looks like a full cycle is around a second, so 80% means 0.8 seconds of work, while 0.2 of pause. However, I doubt that the client is pausing, in the same way as when you suspend computation. I will try the "leave application in memory while suspended" and see what happens.

By the way: the above issue doesn't happen on my new MacBook 2.17, although it's been running for a few hours on 50% single CPU.

Regarding overheating: I am in Greece, and it's around 30C over here. CPU temp. depends a LOT on ambient temperature. A difference in ambient temperature from 28 to 30 degrees C could mean a raise on CPU temperature from 50 to 58 degrees.

Nevertheless, this still looks like a bug to me...
ID: 602853 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 602871 - Posted: 13 Jul 2007, 18:58:40 UTC
Last modified: 13 Jul 2007, 18:59:14 UTC

Interesting about the way throttling seems to be working on your Mac. If that's the case, then as you say this would be a bug for the Mac version of BOINC. OTOH, I don't recall seeing complaints about it from other Mac users. I guess you need to get some feedback from other Mac'ers on that front to be sure, since I don't follow their problems as closely as for WinBoxes.

However the way I described it is the way it works on Windows, which since you have a couple you can test for yourself.

Alinator
ID: 602871 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 603022 - Posted: 13 Jul 2007, 23:10:04 UTC - in response to Message 602871.  
Last modified: 13 Jul 2007, 23:10:51 UTC

OTOH, I don't recall seeing complaints about it from other Mac users. I guess you need to get some feedback from other Mac'ers on that front to be sure, since I don't follow their problems as closely as for WinBoxes.


Indeed, I have not seen anything familiar, either. But then again, who's running seti at less than 100% of their idle CPU cycles?

Anyway, since I enabled the client to stay in memory when paused, I have seen no such messages, so maybe that's a solution to my issue. I'll let you know if anything comes along...
ID: 603022 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 603100 - Posted: 14 Jul 2007, 1:15:47 UTC

Okie Dokie, glad it's helping at this point. :-)

FWIW, I don't find leaving them in memory has much impact on performance when I'm doing other things in Windows, except for hard core gaming and A/V editing. I would think the Mac should even better in that regard, and you can always exit BOINC when you need all the Mac's HP for making a 'happy buck or two'. ;-)

Alinator
ID: 603100 · Report as offensive
SMR
Volunteer tester

Send message
Joined: 29 Jun 99
Posts: 13
Credit: 2,648,825
RAC: 0
United States
Message 603989 - Posted: 15 Jul 2007, 13:31:53 UTC

I am on a mac mini (1.25GHz G4 w/ 1gb memory), running boinc 5.10.7, osx 10.3.9, and with v5.18 it was restarting at me back to less than 2% from around 40-50%; now running an optimized app based on v5.13 it's running fine and much faster. I had my preferences set to run cpu 80% (or I set it down to 50% when it was 95 degrees F here) to keep the cpu cooler and cpu fan at lower speeds.
This doesn't happen on rosetta or einstein, and didn't happen on seti until it autoupdated to v5.18, so I suspect it's some kind of bug. I do have mine set to keep apps in memory when suspended.
I did manage to save (in a text file) info for one of the work units that I aborted, that says the seti 5.18 crashed. (the optimized app doesn't call graphics, so maybe this is related)

Thanks everybody that keeps seti working! SMR
ID: 603989 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 604035 - Posted: 15 Jul 2007, 15:04:38 UTC
Last modified: 15 Jul 2007, 15:05:23 UTC

Ah...

I see you can post now. :-)

Yes, it is not uncommon to notice the fans run faster and/or more often on one project than another. This is due to the fact that all the projects tax the hardware differently, and some of them don't result in as much heat being developed when they run, so most late model machines respond by slowing down/shutting off the fans.

FWIW, I've seen 'strange' crashes like you recently observed on your Mini on normally stable machines, but that's been mostly with high performance laptops where heat dissipation and removal from the case can be a real problem. Given the small form factor of the Mini I suppose that might be what happened. Although I find that kind of surprising since Apple usually goes to extremes to ensure their machines don't have that kind of problem no matter what you are doing on them as long as you stay within the specified normal operating ambient temperature range.

Alinator
ID: 604035 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 604092 - Posted: 15 Jul 2007, 17:16:47 UTC - in response to Message 603989.  

...now running an optimized app based on v5.13 it's running fine and much faster. I had my preferences set to run cpu 80% (or I set it down to 50% when it was 95 degrees F here) to keep the cpu cooler and cpu fan at lower speeds.
This doesn't happen on rosetta or einstein, and didn't happen on seti until it autoupdated to v5.18, so I suspect it's some kind of bug. I do have mine set to keep apps in memory when suspended....


I noticed the behavior noted above running the optimized v5.13, so I guess the bug is in there from the past... Keeping the worker in memory for two days now, and it hasn't happened again...
ID: 604092 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 604441 - Posted: 16 Jul 2007, 8:16:48 UTC

Ooopss... It did it again! Seems like it took some time this time, because I have the CPU% set to 90%... Well, a few minutes ago, while the WU% was at around 50%, it dropped to 14%, so I canceled the WU to read some stderr in the result, just like SMR did on a previous post. Guess what; "Crashed executable name: seti_enhanced-ppc-v7.1-g4-nographics" is in there!

The full result can be found here.

Alex, please fix the "MacOS Error -5000 occured in /Users/alexkan/seti/boinc/api/mac_icon.C line 107" bug...
ID: 604441 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 604577 - Posted: 16 Jul 2007, 15:14:37 UTC - in response to Message 604441.  

On a relative post here, there is some notice that the crash of the science apps is not exclusive to SETI, so I guess this all looks like a BOINC Manager bug.
ID: 604577 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 604667 - Posted: 16 Jul 2007, 19:26:34 UTC
Last modified: 16 Jul 2007, 19:28:47 UTC

Hmmm...

I am having some difficulties on one of my hosts which I'm attributing to a thermal problem which are very similar to yours, but haven't tried throttling it through BOINC yet to see what happens.

What's happening in my case is on a 98SE host I get a Windows fault dialog box from from the Coop's 2.2B app, but when I look at what's running in Process Explorer, both the app and the CC seem to be running normally. However, BOINCView (and I'm going to assume BOINC Manager as well, but will verify that when it happens again) looses it RPC remote connection. IOW, it looks to the remote GUI that BOINC has exited completely.

When this happens, if I clear the fault by OK,ing the Windows error dialog, it will lead to an urecoverable error from the app, but if I force the CC to exit and let the app timeout on a lost heartbeat, then restart BOINC it will proceed normally with no apparent loss of crunch time (other than a normal checkpoint restart). IOW's, even though it has the appearance that everything has stopped from the outside, the app and CC are really still chugging along but apparently are not 'talking' anybody about it.

Don't know for sure how this relates to what your seeing, but I just thought there was some interesting similarities here, since it would seem to be we are both looking at an issue which occurs at the thermal limits for the host.

One other thing here, I wish some of the other Mac folks would take a look and verify your observation about the BOINC throttling mechanism on the Mac. At least we would know if that was a factor in your case or not. ;-)

Alinator
ID: 604667 · Report as offensive
Odysseus
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 1808
Credit: 6,701,347
RAC: 6
Canada
Message 604912 - Posted: 17 Jul 2007, 5:51:47 UTC - in response to Message 604441.  
Last modified: 17 Jul 2007, 5:52:01 UTC

Alex, please fix the "MacOS Error -5000 occured in /Users/alexkan/seti/boinc/api/mac_icon.C line 107" bug...

I’m not sure that’s entirely under his control; the stock app has the same bug under BOINC v5.8.x, but it manifests as “MacOS Error -5000 occured in /Volumes/Sage/BOINC_Mac/boinc/api/mac_icon.C line 107”. I haven’t seen any of those messages from BOINC v5.4.9 or earlier; something to do with the permissions “sandbox”, I imagine, the executable not being allowed to change its icon from the generic terminal-window to the green dish.

ID: 604912 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 605216 - Posted: 17 Jul 2007, 23:03:07 UTC

The new 5.21 app is now official here for PPC Macs. Does it help with the restart problem?
                                                                 Joe
ID: 605216 · Report as offensive
frederick corse

Send message
Joined: 25 Mar 01
Posts: 3
Credit: 1,100,016
RAC: 0
United States
Message 605303 - Posted: 18 Jul 2007, 3:59:55 UTC

I have two 5.2.1 ,13my00ab14824.20034.667534.3.191, 13 my00ab14824.66734.3.194 . At startup they constantly call for a restart. there is no state file in the slots location.With predictor at home a state file is required,there is one in the 5.1.8 .version.
ID: 605303 · Report as offensive
Profile petrusbroder

Send message
Joined: 2 Dec 01
Posts: 9
Credit: 55,701,905
RAC: 1
Sweden
Message 605644 - Posted: 18 Jul 2007, 22:00:01 UTC
Last modified: 18 Jul 2007, 22:07:45 UTC

I have 4 Macs running. All of them G4 - two with dual processors, two with singles. All OS X 10.4.10. Three of the Macs have BOINC v. 5.8.17, one has 5.8.9. I have crunched seti using the seti@home enhanced 5.18 application without problems on all these 4 Macs.
Then it switched to seti@home enhanced 5.21 application and on all my Macs the applications either crashed or restarted many times. I have aborted all WUs.

I got two sets of error messages:

On three of the Macs:
Client error:
Client state: Compute error
Exit status: -177 (0xffffff4f)


The stderr out was:
<core_client_version>5.8.17</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
]]>


on an other Mac the stderr out was:

<core_client_version>5.4.9</core_client_version>
<message>
process exited with code 2 (0x2)
</message>
<stderr_txt>
2007-07-18 17:31:50 [SETI@home] Process creation (../../projects/setiathome.berkeley.edu/setiathome-5.21_AUTHORS) failed: Error -1
execv: No such file or directory

</stderr_txt>


Is this a problem with my comps or some problem with the application. I have looked around, and some other crunchers have reported similar problems before --- but I did not quite understand what to do to avoid the problems ...
ID: 605644 · Report as offensive
Profile Thanar
Avatar

Send message
Joined: 14 May 99
Posts: 50
Credit: 3,223,553
RAC: 0
Greece
Message 624465 - Posted: 22 Aug 2007, 20:24:26 UTC - in response to Message 605216.  
Last modified: 22 Aug 2007, 20:24:46 UTC

The new 5.21 app is now official here for PPC Macs. Does it help with the restart problem?
                                                                 Joe

Actually, looks like the stock 5.23 client works fine, no restarting tasks for over a week now with processor usage % sent to 50.
ID: 624465 · Report as offensive

Message boards : Number crunching : Continuously Restarting Tasks


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.