problem = excited with zero status but no 'finished' file


log in

Advanced search

Questions and Answers : Macintosh : problem = excited with zero status but no 'finished' file

Author Message
Profile jeff stanger / jeffstanger
Send message
Joined: 27 Nov 01
Posts: 3
Credit: 248,578
RAC: 0
Australia
Message 34290 - Posted: 9 Oct 2004, 0:30:04 UTC

Hi All,
It appears that I am not the only one with some issues on this app.
I have managed to get it to run after a bit of stuffing around but when I get it running I get the following message
Result 14mr04aa.17223.17522.304840.32_0 excited with zero status but no 'finished' file

and then

If this heppens repeatedly you may need to reset the project.

then it restarts and this goes on indefinately. I have reset the project several times with no success. If someone can shed any light on my problem I would be most grateful. I do know a bit about UNIX and so I feel comfortable in the terminal but I just fail to see how the instructions I can find are supposed to get this to run properly?

Thanks
Jeff

Profile Martin P.
Send message
Joined: 19 May 99
Posts: 294
Credit: 14,623,397
RAC: 10,673
Austria
Message 76774 - Posted: 5 Feb 2005, 10:17:14 UTC

This problem is well know for months and still unsolved. On my dual processor Macs almost every fourth WU ends with this message (always the same processor!). Nobody seems to be interested in this topic though, although it does cost lots of credit. One example here (My machine is the one that claimes 16.46): http://setiweb.ssl.berkeley.edu/workunit.php?wuid=9083091.

This WU was at 12,000 seconds when the WU from the other processor finished. I received the message "...exited with zero status but no 'finished' file" and the WU started calculating from 0 seconds again. Percentage and scientific output is not affected by this, just the total time. So, instead of 19,777 seconds my computer calculated with only 7777 seconds and therefore the claimed credit is less than half of what it should be.

According to the rules for credits the second lowest claimed credit will be given to all users who returned a valid result. In this case all users will loose appr. 25 credit points due to this error.

Walt Gribben
Volunteer tester
Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 76858 - Posted: 5 Feb 2005, 18:55:58 UTC - in response to Message 76774.
Last modified: 5 Feb 2005, 18:56:25 UTC

> This problem is well know for months and still unsolved. On my dual processor
> Macs almost every fourth WU ends with this message (always the same
> processor!). Nobody seems to be interested in this topic though, although it
> does cost lots of credit. One example here (My machine is the one that claimes
> 16.46): http://setiweb.ssl.berkeley.edu/workunit.php?wuid=9083091.
>
> This WU was at 12,000 seconds when the WU from the other processor finished. I
> received the message "...exited with zero status but no 'finished' file" and
> the WU started calculating from 0 seconds again. Percentage and scientific
> output is not affected by this, just the total time. So, instead of 19,777
> seconds my computer calculated with only 7777 seconds and therefore the
> claimed credit is less than half of what it should be.
>
> According to the rules for credits the second lowest claimed credit will be
> given to all users who returned a valid result. In this case all users will
> loose appr. 25 credit points due to this error.

Two problems here - why the seti app exits early and why BOINC resets the CPU timer.

The first one - exited with zero status but no 'finished' file - means the SETI application stopped crunching for some reason. So you have to find out why. Its in the error log the seti directory.

Windows and Linux machines put the science application in numbered folders in the "slots" folder. Thats in the BOINC folder. Don't think macs are that different, so look there for errors - stderr.txt or something like that.

From what I've seen, the seti app stops with the message "No heartbeat from core client - exiting". Looking at your results, I see that message in this one. So its really because something stopped BOINC from sending heartbeat messages or something stopped the seti app from receiving them.

Since you say you got that when the other seti app finished. I'd say BOINC was tied up trying to upload the completed result. Normally that won't stop BOINC form handling other stuff, except for one thing. DNS lookups. In all the operating systems I've seen, DNS calls lock the calling application until the call completes, so if your Mac isn't talking to your DNS server then BOINC won't be sending heartbeat messages either. DNS timeouts on my Linux systems are more than 30 seconds, so when my DSL line gets disconnected, one WU finishing always causes the other one to "exit with no 'finished' file".



Walt Gribben
Volunteer tester
Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 76933 - Posted: 5 Feb 2005, 23:36:47 UTC - in response to Message 76858.
Last modified: 5 Feb 2005, 23:48:13 UTC

Tried to edit my last message, but missed the 60 minute limit.

Checking the other messages here shows its not a DNS problem, none of the logs show anything like that.

But they mostly show a 30 second delay between one WU finishing and the other one that "excited with zero status but no 'finished' file".

Three things happen when a WU completes - it "finishes", its result gets uploaded and a new one starts.

Uploading the result might cause the problem, but it likely you'd see the same thing with downloads. And noone said anything about downloads so...... Its "easy" to test, just disable network access and see if the problem disappears.

The other one has to do with one WU finishing and the next one starting. I've seen an lot of changes in earlier versions of BOINC that dealt with shared memory. And looking at the change notes shows some cleanup and startup problems were "dealt" with, but not necessarily fixed. I just tested it on a Windows machine and I think theres a problem cleaning up a completed WU and staring a new one.

Specifically, if it has a problem setting up the new WU, it "sleeps" for an interval longer than the heartbeat timeout period. End result is that the seti app working on the other WU thinks BOINC "disappeared" and exits. After putting a "no heartbeat" message into the stderr.txt file.

Questions and Answers : Macintosh : problem = excited with zero status but no 'finished' file

Copyright © 2014 University of California