finish file present too long

Message boards : Number crunching : finish file present too long
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 213
Credit: 70,350,059
RAC: 35,202
United States
Message 1437572 - Posted: 4 Nov 2013, 0:05:55 UTC

Apologies, I managed to cause 8 or more "finish file present too long"
errors from host 5766757 at around 17:48 UTC, 9:48AM PST.
Since they were actual errors others will get good results
and all will be well.

Having a spot of difficulty here
which triggered this (new in Jan 2013) error. Apparently due to
having completed the writing of results but not finishing the task
and a (manual) reboot (one of a few about that time)
introduced a too-long delay. Greater care about suspending
tasks before rebooting Linux might have avoided the problem.

Rebooting Linux was a result of minor issues in upgrading from Ubuntu 13.04
to Ubuntu 13.10.

Thread 727778 (http://setiathome.berkeley.edu/forum_thread.php?id=72778)
was sort of about this specific message too.
ID: 1437572 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13295
Credit: 165,492,265
RAC: 212,097
United Kingdom
Message 1437741 - Posted: 4 Nov 2013, 10:46:31 UTC - in response to Message 1437572.  

Well, it seems to be a deliberate design feature of BOINC - I found this in app_control.cpp:

    // Check for finish files every 10 sec.
    // If we already found a finish file, abort the app;
    // it must be hung somewhere in boinc_finish();

That doesn't, of course, say anything about why your particular machine 'hung in boinc_finish', but taking care when rebooting can obviously help (but shouldn't be necessary).

Other versions of the app (notably, unlike yours, usually Astropulse) seem to have sporadic problems with app_finish: I imagine it's on somebody's 'ToDo' list, but it's probably in the BOINC API code, which is one of those orphaned cinderalla areas that nobody really wants to take ownership of.
ID: 1437741 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1437964 - Posted: 4 Nov 2013, 21:13:11 UTC - in response to Message 1437741.  
Last modified: 4 Nov 2013, 21:53:00 UTC

Well, it seems to be a deliberate design feature of BOINC - I found this in app_control.cpp:

    // Check for finish files every 10 sec.
    // If we already found a finish file, abort the app;
    // it must be hung somewhere in boinc_finish();

That doesn't, of course, say anything about why your particular machine 'hung in boinc_finish', but taking care when rebooting can obviously help (but shouldn't be necessary).

Other versions of the app (notably, unlike yours, usually Astropulse) seem to have sporadic problems with app_finish: I imagine it's on somebody's 'ToDo' list, but it's probably in the BOINC API code, which is one of those orphaned cinderalla areas that nobody really wants to take ownership of.


The general gist is the assumptions in those app control comments are in fact naive/incorrect. If we already found a finished file, it means nothing more than the app made a finished file in the process of exiting.

The duration it takes for an application to exit completely AND the OS to clear slots etc (garbage collection), especially at idle/reduced priority,would 'normally' be fractions of a second, though combination of OS desktop optimisations and high contention (by nature of Boinc applications, post processing, schedule updates and starting new tasks) can be on the order of minutes on a perfectly normally functioning host.

It's a pattern dotted throughout Boinc client and boincapi code assuming it has control, over factors where it has none. Instead it should monitor, negotiate & be patient first (before bringing out the Axe as an absolute last resort). The typically visible symptoms of these 'control issues' include a wide range of induced malfunction, from mildly irritating to severely wasteful in the case of spontaneous aborts.

[Edit:] There are of course OS specific 'Best Practices' for querying states of processes, instead of guessing.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1437964 · Report as offensive

Message boards : Number crunching : finish file present too long


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.