finish file present too long

留言板 : Number crunching : finish file present too long
留言板合理

To post messages, you must log in.

作者消息
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1437964 - 发表于:4 Nov 2013, 21:13:11 UTC - 回复消息 1437741.  
最近的修改日期:4 Nov 2013, 21:53:00 UTC

Well, it seems to be a deliberate design feature of BOINC - I found this in app_control.cpp:

    // Check for finish files every 10 sec.
    // If we already found a finish file, abort the app;
    // it must be hung somewhere in boinc_finish();

That doesn't, of course, say anything about why your particular machine 'hung in boinc_finish', but taking care when rebooting can obviously help (but shouldn't be necessary).

Other versions of the app (notably, unlike yours, usually Astropulse) seem to have sporadic problems with app_finish: I imagine it's on somebody's 'ToDo' list, but it's probably in the BOINC API code, which is one of those orphaned cinderalla areas that nobody really wants to take ownership of.


The general gist is the assumptions in those app control comments are in fact naive/incorrect. If we already found a finished file, it means nothing more than the app made a finished file in the process of exiting.

The duration it takes for an application to exit completely AND the OS to clear slots etc (garbage collection), especially at idle/reduced priority,would 'normally' be fractions of a second, though combination of OS desktop optimisations and high contention (by nature of Boinc applications, post processing, schedule updates and starting new tasks) can be on the order of minutes on a perfectly normally functioning host.

It's a pattern dotted throughout Boinc client and boincapi code assuming it has control, over factors where it has none. Instead it should monitor, negotiate & be patient first (before bringing out the Axe as an absolute last resort). The typically visible symptoms of these 'control issues' include a wide range of induced malfunction, from mildly irritating to severely wasteful in the case of spontaneous aborts.

[Edit:] There are of course OS specific 'Best Practices' for querying states of processes, instead of guessing.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1437964 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1437741 - 发表于:4 Nov 2013, 10:46:31 UTC - 回复消息 1437572.  

Well, it seems to be a deliberate design feature of BOINC - I found this in app_control.cpp:

    // Check for finish files every 10 sec.
    // If we already found a finish file, abort the app;
    // it must be hung somewhere in boinc_finish();

That doesn't, of course, say anything about why your particular machine 'hung in boinc_finish', but taking care when rebooting can obviously help (but shouldn't be necessary).

Other versions of the app (notably, unlike yours, usually Astropulse) seem to have sporadic problems with app_finish: I imagine it's on somebody's 'ToDo' list, but it's probably in the BOINC API code, which is one of those orphaned cinderalla areas that nobody really wants to take ownership of.
ID: 1437741 · 举报违规帖子
Profile David Anderson (not *that* DA) Project Donor
Avatar

发送消息
已加入:5 Dec 09
贴子:215
积分:74,008,558
近期平均积分:74
United States
消息 1437572 - 发表于:4 Nov 2013, 0:05:55 UTC

Apologies, I managed to cause 8 or more "finish file present too long"
errors from host 5766757 at around 17:48 UTC, 9:48AM PST.
Since they were actual errors others will get good results
and all will be well.

Having a spot of difficulty here
which triggered this (new in Jan 2013) error. Apparently due to
having completed the writing of results but not finishing the task
and a (manual) reboot (one of a few about that time)
introduced a too-long delay. Greater care about suspending
tasks before rebooting Linux might have avoided the problem.

Rebooting Linux was a result of minor issues in upgrading from Ubuntu 13.04
to Ubuntu 13.10.

Thread 727778 (http://setiathome.berkeley.edu/forum_thread.php?id=72778)
was sort of about this specific message too.
ID: 1437572 · 举报违规帖子

留言板 : Number crunching : finish file present too long


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.