Message boards :
Number crunching :
finish file present too long
Message board moderation
Author | Message |
---|---|
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Apologies, I managed to cause 8 or more "finish file present too long" errors from host 5766757 at around 17:48 UTC, 9:48AM PST. Since they were actual errors others will get good results and all will be well. Having a spot of difficulty here which triggered this (new in Jan 2013) error. Apparently due to having completed the writing of results but not finishing the task and a (manual) reboot (one of a few about that time) introduced a too-long delay. Greater care about suspending tasks before rebooting Linux might have avoided the problem. Rebooting Linux was a result of minor issues in upgrading from Ubuntu 13.04 to Ubuntu 13.10. Thread 727778 (http://setiathome.berkeley.edu/forum_thread.php?id=72778) was sort of about this specific message too. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Well, it seems to be a deliberate design feature of BOINC - I found this in app_control.cpp: // Check for finish files every 10 sec. // If we already found a finish file, abort the app; // it must be hung somewhere in boinc_finish(); That doesn't, of course, say anything about why your particular machine 'hung in boinc_finish', but taking care when rebooting can obviously help (but shouldn't be necessary). Other versions of the app (notably, unlike yours, usually Astropulse) seem to have sporadic problems with app_finish: I imagine it's on somebody's 'ToDo' list, but it's probably in the BOINC API code, which is one of those orphaned cinderalla areas that nobody really wants to take ownership of. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Well, it seems to be a deliberate design feature of BOINC - I found this in app_control.cpp: The general gist is the assumptions in those app control comments are in fact naive/incorrect. If we already found a finished file, it means nothing more than the app made a finished file in the process of exiting. The duration it takes for an application to exit completely AND the OS to clear slots etc (garbage collection), especially at idle/reduced priority,would 'normally' be fractions of a second, though combination of OS desktop optimisations and high contention (by nature of Boinc applications, post processing, schedule updates and starting new tasks) can be on the order of minutes on a perfectly normally functioning host. It's a pattern dotted throughout Boinc client and boincapi code assuming it has control, over factors where it has none. Instead it should monitor, negotiate & be patient first (before bringing out the Axe as an absolute last resort). The typically visible symptoms of these 'control issues' include a wide range of induced malfunction, from mildly irritating to severely wasteful in the case of spontaneous aborts. [Edit:] There are of course OS specific 'Best Practices' for querying states of processes, instead of guessing. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.