finish file present too long

Message boards : Number crunching : finish file present too long
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1742344 - Posted: 15 Nov 2015, 20:28:55 UTC

What does this error mean. One wingmate has finished this WU and his results were the same as mine, except I got this error in my stderr file.
The WU is: http://setiathome.berkeley.edu/workunit.php?wuid=1963564702

Mostly just curious if there is something I am doing wrong.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1742344 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1742351 - Posted: 15 Nov 2015, 21:02:40 UTC - in response to Message 1742344.  

It probably means that your computer is (at least marginally) over-committed - trying to do too many things at once, and having to juggle resources around to meet all the demands on it.

One known problem is that while the AMD FX(tm)-8320 Eight-Core Processor has eight true cpu cores, it only has four floating-point arithmetic units - so while simple applications run at full speed, complicated mathematical apps like SETI have to wait and share.

The specific error message - BOINC is fussy about the length of time an application takes to clean up all the housekeeping, release memory, etc. etc. after it finishes. BOINC wants to get busy and working on the next task, and if the previous one hangs around and refuses to leave home (like an unwanted teenager....), BOINC just boots it out - no supper credits for you, my son.

The good news: a forthcoming BOINC update is expected to extend the housekeeping limits, but don't upgrade yet - the current test version (v7.6.15) also fiddles with the working priority of the BOINC client program, and this afternoon I reported three 'finish file present too long' errors on my test machine since loading v7.6.15 four days ago. I don't think that's a coincidence. I also reported a fifty-fold increase in the rate of

11-Nov-2015 22:07:06 [SETI@home] Task 18my11ab.5995.476.8.12.207_0 exited with zero status but no 'finished' file
11-Nov-2015 22:07:06 [SETI@home] If this happens repeatedly you may need to reset the project.

warnings since the upgrade. If BOINC runs at too low a (process/thread) priority, and is hard-pressed on resources anyway, it's more likely that BOINC will fail to notice that it's attention is needed to service a heartbeat check in an application, or do a task cleanup.

If this is the first time you've seen the error - probably just random bad luck (did two or three tasks all need cleaning up at almost the same time?). If it's repeated - see if the machine is showing signs of stress, like constant hard-disk activity. See what you can do to lighten the load.
ID: 1742351 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1742362 - Posted: 15 Nov 2015, 21:46:45 UTC - in response to Message 1742351.  

Thanks Richard for the explanation. I will wait and see about the next BOINC.

This computer is just a cruncher, but it does run the latest Windows10 beta if that means anything. I have not noticed this error before that I am aware of, just happened to be checking in on an errored WU. I have always run betas on this computer, but it did have a video card upgrade not too long ago. No overclocking and with the temp in the computer room now running around 4-10C I do not think cooling had anything to do with it.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1742362 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1742368 - Posted: 15 Nov 2015, 22:05:26 UTC - in response to Message 1742362.  

Win10 probably does have something to do with it - I haven't tried it myself, but from what I read, Win10 itself can be enough to over-commit the machine, what with all the i/o and disk access going on, especially at startup.

That was why BOINC was asked to reduce its process priority, so that Windows startup didn't get impacted by BOINC getting in the way. But I think they've gone too far, and not allowed BOINC to retain enough resources to do what it needs to do. We'll see.
ID: 1742368 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1742385 - Posted: 15 Nov 2015, 23:33:09 UTC - in response to Message 1742368.  
Last modified: 15 Nov 2015, 23:48:35 UTC

Personally I wish there was a way to delay the Start of programs waiting to auto start with the start of Windows.

However this computer runs 24/7 so startup should not have affected it. (It did an upgrade yesterday which means I had to reinstall the video drivers before I ran SETI)

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1742385 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1742389 - Posted: 15 Nov 2015, 23:48:42 UTC - in response to Message 1742385.  

The request came from someone with a laptop who does extreme things like run multiple VMs at startup under Windows 10 - and I guess startup happens more often with a laptop. I've re-quoted "Hard cases make bad law" back at them.
ID: 1742389 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1742400 - Posted: 16 Nov 2015, 0:48:58 UTC - in response to Message 1742389.  
Last modified: 16 Nov 2015, 1:16:27 UTC

For CPU only tasks, In a *slightly* better implementation for a complex situation, the client might sense the radical overcommit in a similar way to how we do, then take some evasive action.

For me manually that would be by noticing the CPU time on a task is only a tiny fraction of elapsed, and snoozing the client until settled. A periodic probe would only take a few seconds at most. The same method wouldn't trigger for GPU tasks running alone (or possibly other kinds to come), and something optional/user-configurable as opposed to one-size-fits-all seems appropriate as touched on.

I've mentioned the use of fixed magic numbers for file timeouts as being problematic to devs already before. I'm not sure it registered exactly what I was talking about amidst the noise, but I've been gaining a lot of 'faith' in Murphy lately, in that less whining and more patience seems to see the natural order assert itself. [The natural order being that time sensitive programming breaks on a non-realtime OS, which is why a lot has moved to callbacks/IO-completion ports, away from event/interrupt driven and synchronous code. ]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1742400 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1742511 - Posted: 16 Nov 2015, 9:55:03 UTC

I vaguely remember a setting that tells boinc how long to wait before starting?
RTFM...

<start_delay>nseconds</start_delay>
Specify a number of seconds to delay running applications after client startup.

Options portion of cc_config

Helps especially if there's a lot loading at startup on not so fast systems.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1742511 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1742528 - Posted: 16 Nov 2015, 12:02:23 UTC - in response to Message 1742511.  

Apologies in advance, because some of this will only be accessible to those who are following the related "Slow to startup, slow to start running?" boinc_alpha email thread I was alluding to.

Sekerob's post at Nov 11 at 9:56 PM is relevant:

Does ol <start_delay>300</start_delay> and the windows service delay
function [if installed as service] "Automatic (Delayed Start)" not fit
the bill to take care of sluggish starting? (It does for me!).

There are issues concerning (both, but separately) BOINC's own startup, and the startup of the science applications under BOINC's control after it has itself started. <start_delay> applies to the second phase only.

There are indications (specifically, a 'red dot' on the tray icon, and 'reconnecting to client' in the Manager status bar) that the BOINC client's own initial startup is fighting for resources with Windows - especially Windows 10. My concern is that lowering the client thread priority will actually make this problem worse, and then leave lingering collateral damage throughout the ongoing client session until the next startup (from hours for laptops, to weeks for workstations).

Sekerob's "Automatic (Delayed Start)" is ideal (and I've used it myself), but it only applies to service mode, and thus draws blank stares from the portion of the SETI message board readership who use GPUs. 95%?

We perhaps need to import tricks from the Linux side of the community, like startup scripts where delays or sequence directives can be included to ensure BOINC starts after the X-server, allowing GPU detection. But that's a feature enhancement, and - as I've already posted this morning - brings into play terms like "systems analysis", "engineering", and even "theory" - none of which fit comfortably with publishing a rapid-reaction bugfix.
ID: 1742528 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1742534 - Posted: 16 Nov 2015, 13:33:57 UTC - in response to Message 1742528.  

... brings into play terms like "systems analysis", "engineering", and even "theory" - none of which fit comfortably with publishing a rapid-reaction bugfix.


*looks out of cave briefly while recovering from a cold*:
And possibly throw in a healthy dose of ripping off bandaids to fit the engineered solution(s) in place. Not painless or cost free.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1742534 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1742536 - Posted: 16 Nov 2015, 13:42:38 UTC - in response to Message 1742534.  
Last modified: 16 Nov 2015, 13:43:36 UTC

... brings into play terms like "systems analysis", "engineering", and even "theory" - none of which fit comfortably with publishing a rapid-reaction bugfix.


*looks out of cave briefly while recovering from a cold*:
And possibly throw in a healthy dose of ripping off bandaids to fit the engineered solution(s) in place. Not painless or cost free.

In the case of BOINC taking the bandaids off might have an equally devastating effect as taking the wrappings off a mummy.

edit: so the crux is Windows (10) doesn't play nice (pun intended)?
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1742536 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1742538 - Posted: 16 Nov 2015, 14:08:27 UTC - in response to Message 1742528.  

There are indications (specifically, a 'red dot' on the tray icon, and 'reconnecting to client' in the Manager status bar) that the BOINC client's own initial startup is fighting for resources with Windows - especially Windows 10.

On a personal note: I see the 'red dot' and the 'reconnecting to client' every time I restart BOINC. This has happened since Windows 7 and is not new to Windows 10 (for me). It may be more evident in Windows 10 but I can not say that as it has always been there for me. It shows when I restart BOINC and when I restart Windows 7 or 10.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1742538 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1742567 - Posted: 16 Nov 2015, 15:40:35 UTC - in response to Message 1742538.  

There are indications (specifically, a 'red dot' on the tray icon, and 'reconnecting to client' in the Manager status bar) that the BOINC client's own initial startup is fighting for resources with Windows - especially Windows 10.

On a personal note: I see the 'red dot' and the 'reconnecting to client' every time I restart BOINC. This has happened since Windows 7 and is not new to Windows 10 (for me). It may be more evident in Windows 10 but I can not say that as it has always been there for me. It shows when I restart BOINC and when I restart Windows 7 or 10.

Yes, it always starts that way - that is to say that BOINC Manager always starts that way, which is the usual startup routine for "user mode" installations. Service mode starts diferently, and doesn't necessarily involve starting the Manager at all.

The question which has come up for discussion on the mailing lists is how long does it take for the red dot to disappear (i.e. for the Manager to establish connection with a running client), whether this is too long in general, and whether it is longer for certain computers, and/or certain versions of Windows, than it needs to be - and whether anything can be done to speed up or disguise the process.
ID: 1742567 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1742603 - Posted: 16 Nov 2015, 17:47:04 UTC

On three of my four the red dot is there for a second or two, on the forth it is there for maybe thirty seconds on most re-starts of BOINC. All four are running Windows 7, have oodles of RAM. Two of the three "quick starts are running W7 Pro, the third is running W7 home, the slow coach is running W7 Pro.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1742603 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1744385 - Posted: 23 Nov 2015, 21:00:59 UTC - in response to Message 1742528.  

My concern is that lowering the client thread priority will actually make this problem worse, and then leave lingering collateral damage throughout the ongoing client session until the next startup (from hours for laptops, to weeks for workstations).

I've built 7.6.17 from source, it's back to running the client and manager at Normal priority, even though it doesn't say that in the change log messages.
ID: 1744385 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1744489 - Posted: 24 Nov 2015, 11:18:02 UTC - in response to Message 1744385.  

My concern is that lowering the client thread priority will actually make this problem worse, and then leave lingering collateral damage throughout the ongoing client session until the next startup (from hours for laptops, to weeks for workstations).

I've built 7.6.17 from source, it's back to running the client and manager at Normal priority, even though it doesn't say that in the change log messages.

I've just been comparing 7.6.9, .15, and .16 for Windows7/64 (haven't got hold of a .17 yet)

The Manager runs at normal priority in all three versions - I think we can discount that.

.15 shows the client running at Low priority (according to Task Manager), and .16 at Normal priority - but there's a difference at the thread level. Process Explorer says:

			.9	.15		.16
			--	---		---
App Priority		Normal	Low		Normal
Base Priority		8	4		4
Dynamic Priority	10	6		6
I/O Priority		Normal	Very Low	Very Low
Memory Priority		5	1		1

That's for the worker thread - the one with millions of Cycles Delta. The other threads are in accordance with the main app priorities.
ID: 1744489 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1744504 - Posted: 24 Nov 2015, 13:20:12 UTC - in response to Message 1744489.  
Last modified: 24 Nov 2015, 13:23:31 UTC

Yes haven't been following that boinc dev discussion 100% myself either.

For Windows, you have first the process priority class, then second the thread priority class. The first one (process) is absolute, while the thread priority class is relative to the process ( 'normal thread priority being effectively the same as the process priority). Lower would be a 'more idle' thread and higher being a small bump over the regular process one, used for managing multiple threads within the same app.

For this particular client and gui, messing with lowering any priorities ( process, thread, IO or memory ) isn't likely going to work well, since there is hardwired time sensitive code pretty much everywhere.

People queuing up for a major Apple store release, after christmas clearance sales, or blockbuster movie premiere have the idea: bring lots of coffee, patience, maybe a chair and a book.

The hardwired timeouts on this code is problematic for some normal cases (under contention). Lowering anything will indeed just expose the coded in limitations even more.

Writing low priority services, and the like, is harder than just changing the assigned priorities.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744504 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1744514 - Posted: 24 Nov 2015, 14:29:59 UTC

If anyone can bear to look, the key checkin is 21417071146cb7fdc11aa3c0ef9414d7ad94c23d. That goes further than the user request in #1392, by (a) including process priority, not just memory and IO, and (b) by changing the client priority itself, rather than just the science apps.

There were a few bug-fixes, but that's the basis of what came out in v7.6.15

Then we switched from "SetPriorityClass" to "SetThreadPriority" with 24f62d01906da762ccda774380666889a19af511 - I think that's the change which is visible in v7.6.16. I haven't seen any changes after that - which is why I drilled down more deeply into Jord's report about his self-built v7.6.17 - but several people have requested a re-think.
ID: 1744514 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1744517 - Posted: 24 Nov 2015, 14:33:33 UTC - in response to Message 1744514.  
Last modified: 24 Nov 2015, 14:59:47 UTC

Then we switched from "SetPriorityClass" to "SetThreadPriority" with
Yeah, setting the thread priority will only have meaning within the process's threads, so not sure what they're aiming for there. Will have a look and see if I can work out what they are trying to achieve.

[Edit:] OK, understand what was attempted from the original request. Some key points to consider, that indicate priorities aren't the way to address this are:
- The state of system overcommit described will occur nomatter what is fiddled, since the the total amount of work stays the same,
- what needs reducing in the example scenario is the amount/type of work. There are ways to detect/manage workload in a simply configurable way, but maybe out of scope for the Boinc project (?)
- vm's in particular have a lot of overhead related to virtualising their memory space. I don't know if the vm's in question use a lot of hardware virtualisation extensions (I'm guessing minimal), but those are a potential bottleneck (limited resource) with multiple vm's. That's why someone running lots of VM's might have a mutiprocessor system with bucketloads of memory (8Gb is not much in workstation/enterprise class systems)
- reducing priorities of anything will expose the designed in limitations I already described, but not likely help the user's particular described condition.

[Edit2:] unintuitively, raising certain priorities could help get some things out of the way, and so reduce contention, but very system specific

[Edit3:] One idealised model would be some higher priority processes that do very little, but monitor, manage as efficiently as possible then go to sleep most of the time .... driving low-idle processes with minimal time sensitivity. Other ways to do it, but that's one coherent model.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744517 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1744527 - Posted: 24 Nov 2015, 15:00:29 UTC - in response to Message 1744517.  
Last modified: 24 Nov 2015, 15:16:38 UTC

Yes. I've just been checking the thread priorities of a bog-standard CPU-only application (NumberFields 'GetDecics_2.07', as it happens). Under v7.6.15, that has IO priority "Very Low", and memory priority 1, but under both v7.6.9 and v7.6.16 it has IO "Normal", memory 5.

So v7.6.16 doesn't even achieve what was requested in the original GitHub 'issue' - though I haven't yet checked the special case handlers for VBox and GPUs, which were added somewhere along the way - I'll go find the checkin.

Edit - I was probably thinking about 517cc53c67bfb651caa5600b140ae1315c5f0410, which added user interface overrides, rather than the special cases themselves. I'll keep looking.

Edit2 - remembered that I was using the v7.6.16 machine to test the user override for client priority - so the client was running at normal priority. But I've cleared that, and the observation still stands after restart - application memory and IO are not controlled as originally requested.
ID: 1744527 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : finish file present too long


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.