Message boards :
Number crunching :
finish file present too long
Message board moderation
Author | Message |
---|---|
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
What does this error mean. One wingmate has finished this WU and his results were the same as mine, except I got this error in my stderr file. The WU is: http://setiathome.berkeley.edu/workunit.php?wuid=1963564702 Mostly just curious if there is something I am doing wrong. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
It probably means that your computer is (at least marginally) over-committed - trying to do too many things at once, and having to juggle resources around to meet all the demands on it. One known problem is that while the AMD FX(tm)-8320 Eight-Core Processor has eight true cpu cores, it only has four floating-point arithmetic units - so while simple applications run at full speed, complicated mathematical apps like SETI have to wait and share. The specific error message - BOINC is fussy about the length of time an application takes to clean up all the housekeeping, release memory, etc. etc. after it finishes. BOINC wants to get busy and working on the next task, and if the previous one hangs around and refuses to leave home (like an unwanted teenager....), BOINC just boots it out - no The good news: a forthcoming BOINC update is expected to extend the housekeeping limits, but don't upgrade yet - the current test version (v7.6.15) also fiddles with the working priority of the BOINC client program, and this afternoon I reported three 'finish file present too long' errors on my test machine since loading v7.6.15 four days ago. I don't think that's a coincidence. I also reported a fifty-fold increase in the rate of 11-Nov-2015 22:07:06 [SETI@home] Task 18my11ab.5995.476.8.12.207_0 exited with zero status but no 'finished' file warnings since the upgrade. If BOINC runs at too low a (process/thread) priority, and is hard-pressed on resources anyway, it's more likely that BOINC will fail to notice that it's attention is needed to service a heartbeat check in an application, or do a task cleanup. If this is the first time you've seen the error - probably just random bad luck (did two or three tasks all need cleaning up at almost the same time?). If it's repeated - see if the machine is showing signs of stress, like constant hard-disk activity. See what you can do to lighten the load. |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
Thanks Richard for the explanation. I will wait and see about the next BOINC. This computer is just a cruncher, but it does run the latest Windows10 beta if that means anything. I have not noticed this error before that I am aware of, just happened to be checking in on an errored WU. I have always run betas on this computer, but it did have a video card upgrade not too long ago. No overclocking and with the temp in the computer room now running around 4-10C I do not think cooling had anything to do with it. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Win10 probably does have something to do with it - I haven't tried it myself, but from what I read, Win10 itself can be enough to over-commit the machine, what with all the i/o and disk access going on, especially at startup. That was why BOINC was asked to reduce its process priority, so that Windows startup didn't get impacted by BOINC getting in the way. But I think they've gone too far, and not allowed BOINC to retain enough resources to do what it needs to do. We'll see. |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
Personally I wish there was a way to delay the Start of programs waiting to auto start with the start of Windows. However this computer runs 24/7 so startup should not have affected it. (It did an upgrade yesterday which means I had to reinstall the video drivers before I ran SETI) SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
The request came from someone with a laptop who does extreme things like run multiple VMs at startup under Windows 10 - and I guess startup happens more often with a laptop. I've re-quoted "Hard cases make bad law" back at them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
For CPU only tasks, In a *slightly* better implementation for a complex situation, the client might sense the radical overcommit in a similar way to how we do, then take some evasive action. For me manually that would be by noticing the CPU time on a task is only a tiny fraction of elapsed, and snoozing the client until settled. A periodic probe would only take a few seconds at most. The same method wouldn't trigger for GPU tasks running alone (or possibly other kinds to come), and something optional/user-configurable as opposed to one-size-fits-all seems appropriate as touched on. I've mentioned the use of fixed magic numbers for file timeouts as being problematic to devs already before. I'm not sure it registered exactly what I was talking about amidst the noise, but I've been gaining a lot of 'faith' in Murphy lately, in that less whining and more patience seems to see the natural order assert itself. [The natural order being that time sensitive programming breaks on a non-realtime OS, which is why a lot has moved to callbacks/IO-completion ports, away from event/interrupt driven and synchronous code. ] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
I vaguely remember a setting that tells boinc how long to wait before starting? RTFM... <start_delay>nseconds</start_delay> Specify a number of seconds to delay running applications after client startup. Options portion of cc_config Helps especially if there's a lot loading at startup on not so fast systems. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Apologies in advance, because some of this will only be accessible to those who are following the related "Slow to startup, slow to start running?" boinc_alpha email thread I was alluding to. Sekerob's post at Nov 11 at 9:56 PM is relevant: Does ol <start_delay>300</start_delay> and the windows service delay There are issues concerning (both, but separately) BOINC's own startup, and the startup of the science applications under BOINC's control after it has itself started. <start_delay> applies to the second phase only. There are indications (specifically, a 'red dot' on the tray icon, and 'reconnecting to client' in the Manager status bar) that the BOINC client's own initial startup is fighting for resources with Windows - especially Windows 10. My concern is that lowering the client thread priority will actually make this problem worse, and then leave lingering collateral damage throughout the ongoing client session until the next startup (from hours for laptops, to weeks for workstations). Sekerob's "Automatic (Delayed Start)" is ideal (and I've used it myself), but it only applies to service mode, and thus draws blank stares from the portion of the SETI message board readership who use GPUs. 95%? We perhaps need to import tricks from the Linux side of the community, like startup scripts where delays or sequence directives can be included to ensure BOINC starts after the X-server, allowing GPU detection. But that's a feature enhancement, and - as I've already posted this morning - brings into play terms like "systems analysis", "engineering", and even "theory" - none of which fit comfortably with publishing a rapid-reaction bugfix. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
... brings into play terms like "systems analysis", "engineering", and even "theory" - none of which fit comfortably with publishing a rapid-reaction bugfix. *looks out of cave briefly while recovering from a cold*: And possibly throw in a healthy dose of ripping off bandaids to fit the engineered solution(s) in place. Not painless or cost free. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
... brings into play terms like "systems analysis", "engineering", and even "theory" - none of which fit comfortably with publishing a rapid-reaction bugfix. In the case of BOINC taking the bandaids off might have an equally devastating effect as taking the wrappings off a mummy. edit: so the crux is Windows (10) doesn't play nice (pun intended)? A person who won't read has no advantage over one who can't read. (Mark Twain) |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
There are indications (specifically, a 'red dot' on the tray icon, and 'reconnecting to client' in the Manager status bar) that the BOINC client's own initial startup is fighting for resources with Windows - especially Windows 10. On a personal note: I see the 'red dot' and the 'reconnecting to client' every time I restart BOINC. This has happened since Windows 7 and is not new to Windows 10 (for me). It may be more evident in Windows 10 but I can not say that as it has always been there for me. It shows when I restart BOINC and when I restart Windows 7 or 10. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
There are indications (specifically, a 'red dot' on the tray icon, and 'reconnecting to client' in the Manager status bar) that the BOINC client's own initial startup is fighting for resources with Windows - especially Windows 10. Yes, it always starts that way - that is to say that BOINC Manager always starts that way, which is the usual startup routine for "user mode" installations. Service mode starts diferently, and doesn't necessarily involve starting the Manager at all. The question which has come up for discussion on the mailing lists is how long does it take for the red dot to disappear (i.e. for the Manager to establish connection with a running client), whether this is too long in general, and whether it is longer for certain computers, and/or certain versions of Windows, than it needs to be - and whether anything can be done to speed up or disguise the process. |
rob smith Send message Joined: 7 Mar 03 Posts: 22227 Credit: 416,307,556 RAC: 380 |
On three of my four the red dot is there for a second or two, on the forth it is there for maybe thirty seconds on most re-starts of BOINC. All four are running Windows 7, have oodles of RAM. Two of the three "quick starts are running W7 Pro, the third is running W7 home, the slow coach is running W7 Pro. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
My concern is that lowering the client thread priority will actually make this problem worse, and then leave lingering collateral damage throughout the ongoing client session until the next startup (from hours for laptops, to weeks for workstations). I've built 7.6.17 from source, it's back to running the client and manager at Normal priority, even though it doesn't say that in the change log messages. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
My concern is that lowering the client thread priority will actually make this problem worse, and then leave lingering collateral damage throughout the ongoing client session until the next startup (from hours for laptops, to weeks for workstations). I've just been comparing 7.6.9, .15, and .16 for Windows7/64 (haven't got hold of a .17 yet) The Manager runs at normal priority in all three versions - I think we can discount that. .15 shows the client running at Low priority (according to Task Manager), and .16 at Normal priority - but there's a difference at the thread level. Process Explorer says: .9 .15 .16 -- --- --- App Priority Normal Low Normal Base Priority 8 4 4 Dynamic Priority 10 6 6 I/O Priority Normal Very Low Very Low Memory Priority 5 1 1 That's for the worker thread - the one with millions of Cycles Delta. The other threads are in accordance with the main app priorities. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yes haven't been following that boinc dev discussion 100% myself either. For Windows, you have first the process priority class, then second the thread priority class. The first one (process) is absolute, while the thread priority class is relative to the process ( 'normal thread priority being effectively the same as the process priority). Lower would be a 'more idle' thread and higher being a small bump over the regular process one, used for managing multiple threads within the same app. For this particular client and gui, messing with lowering any priorities ( process, thread, IO or memory ) isn't likely going to work well, since there is hardwired time sensitive code pretty much everywhere. People queuing up for a major Apple store release, after christmas clearance sales, or blockbuster movie premiere have the idea: bring lots of coffee, patience, maybe a chair and a book. The hardwired timeouts on this code is problematic for some normal cases (under contention). Lowering anything will indeed just expose the coded in limitations even more. Writing low priority services, and the like, is harder than just changing the assigned priorities. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
If anyone can bear to look, the key checkin is 21417071146cb7fdc11aa3c0ef9414d7ad94c23d. That goes further than the user request in #1392, by (a) including process priority, not just memory and IO, and (b) by changing the client priority itself, rather than just the science apps. There were a few bug-fixes, but that's the basis of what came out in v7.6.15 Then we switched from "SetPriorityClass" to "SetThreadPriority" with 24f62d01906da762ccda774380666889a19af511 - I think that's the change which is visible in v7.6.16. I haven't seen any changes after that - which is why I drilled down more deeply into Jord's report about his self-built v7.6.17 - but several people have requested a re-think. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Then we switched from "SetPriorityClass" to "SetThreadPriority" withYeah, setting the thread priority will only have meaning within the process's threads, so not sure what they're aiming for there. Will have a look and see if I can work out what they are trying to achieve. [Edit:] OK, understand what was attempted from the original request. Some key points to consider, that indicate priorities aren't the way to address this are: - The state of system overcommit described will occur nomatter what is fiddled, since the the total amount of work stays the same, - what needs reducing in the example scenario is the amount/type of work. There are ways to detect/manage workload in a simply configurable way, but maybe out of scope for the Boinc project (?) - vm's in particular have a lot of overhead related to virtualising their memory space. I don't know if the vm's in question use a lot of hardware virtualisation extensions (I'm guessing minimal), but those are a potential bottleneck (limited resource) with multiple vm's. That's why someone running lots of VM's might have a mutiprocessor system with bucketloads of memory (8Gb is not much in workstation/enterprise class systems) - reducing priorities of anything will expose the designed in limitations I already described, but not likely help the user's particular described condition. [Edit2:] unintuitively, raising certain priorities could help get some things out of the way, and so reduce contention, but very system specific [Edit3:] One idealised model would be some higher priority processes that do very little, but monitor, manage as efficiently as possible then go to sleep most of the time .... driving low-idle processes with minimal time sensitivity. Other ways to do it, but that's one coherent model. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Yes. I've just been checking the thread priorities of a bog-standard CPU-only application (NumberFields 'GetDecics_2.07', as it happens). Under v7.6.15, that has IO priority "Very Low", and memory priority 1, but under both v7.6.9 and v7.6.16 it has IO "Normal", memory 5. So v7.6.16 doesn't even achieve what was requested in the original GitHub 'issue' - though I haven't yet checked the special case handlers for VBox and GPUs, which were added somewhere along the way - I'll go find the checkin. Edit - I was probably thinking about 517cc53c67bfb651caa5600b140ae1315c5f0410, which added user interface overrides, rather than the special cases themselves. I'll keep looking. Edit2 - remembered that I was using the v7.6.16 machine to test the user override for client priority - so the client was running at normal priority. But I've cleared that, and the observation still stands after restart - application memory and IO are not controlled as originally requested. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.