Strange Message


log in

Advanced search

Message boards : Number crunching : Strange Message

Author Message
Profile [B^S] madmac
Volunteer tester
Avatar
Send message
Joined: 9 Feb 04
Posts: 1150
Credit: 3,833,807
RAC: 2,565
United Kingdom
Message 1242099 - Posted: 6 Jun 2012, 8:19:59 UTC

Whilst doing wu = 1001902298 it took over 16 hrs to do and when I check after it reported it shut itself down and restarted itself wonder why it did that any ideas?
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,394,590
RAC: 87,954
United States
Message 1242109 - Posted: 6 Jun 2012, 10:34:37 UTC

What was the "Strange Message" you referred to?
____________

Profile [B^S] madmac
Volunteer tester
Avatar
Send message
Joined: 9 Feb 04
Posts: 1150
Credit: 3,833,807
RAC: 2,565
United Kingdom
Message 1242121 - Posted: 6 Jun 2012, 11:24:33 UTC

The message was this; requesting safe worker shutdown -> it was the first time that this has happenned happenned to be away when it occurred
____________

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1242150 - Posted: 6 Jun 2012, 13:11:13 UTC

task link

That's just to tell you that boinc told the app to exit.
what's more interesting is that you have a restart without a previous exit.
CPU time looks normal compared with your other tasks.
The runtime doesn't - something else hogged the CPU in that time or the task got stuck somehow.

You could check the log file to see if boinc noted anything interesting while taht task ran.

boinc runs benchmarks now and again, this may have cause one restart - else if you are running more than one project it might have been swapped out.
____________
I'm not the Pope. I don't speak Ex Cathedra!

Profile [B^S] madmac
Volunteer tester
Avatar
Send message
Joined: 9 Feb 04
Posts: 1150
Credit: 3,833,807
RAC: 2,565
United Kingdom
Message 1242305 - Posted: 6 Jun 2012, 16:22:34 UTC

Check the log file nothing so it must have got stuck as my other machine worked well
____________

Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 11509
Credit: 1,725,867
RAC: 1,723
Israel
Message 1242785 - Posted: 7 Jun 2012, 14:46:02 UTC
Last modified: 7 Jun 2012, 14:48:55 UTC

Just found several inconclusives on mine & on checking the wu...find that I got the same message.

2463078999

This one has an extra message in it..

2463079004
____________

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1242799 - Posted: 7 Jun 2012, 15:06:40 UTC

It's just the standard exit message. [for Jason's boinc API anyway]
He could probably tell you what that 'exiting anyway' is all about.

The tasks have gone inconclusive because of differences in the spike count.
That's almost always to do with differences in rounding error between CPU and GPU apps and spikes just at the detection threshold. Joe could tell you all about that one :)

Mostly the results are similar enough to validate in the end.
____________
I'm not the Pope. I don't speak Ex Cathedra!

Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 11509
Credit: 1,725,867
RAC: 1,723
Israel
Message 1242809 - Posted: 7 Jun 2012, 15:26:28 UTC - in response to Message 1242799.

Thanks LadyL. fwiw, all my rigs are cpu crunchers only.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,508,005
RAC: 48,043
United Kingdom
Message 1242810 - Posted: 7 Jun 2012, 15:27:24 UTC - in response to Message 1242799.

I think

boinc_exit(): worker didn't respond to exit request within 2 seconds, exiting anyway.

may be being counted as an extra message.

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,835,577
RAC: 12,155
Australia
Message 1242816 - Posted: 7 Jun 2012, 15:42:04 UTC
Last modified: 7 Jun 2012, 16:13:16 UTC

3 things in one post :)

1) Standard Boinc Api uses a rather brutal TerminateProcess() windows api call to stop the program when done (or for other reasons). This is not recommended in Microsoft best practices, as multithreaded OSes have all sorts of Buffers & shared/global DLL things going on for delayed action, so my modified BoincApi uses a request/acknowledge shut-down protocol to try to be more graceful about the shutdown.

2) 'Exiting anyway'. The worker thread was otherwise occupied, or preempted by some other higher priority application running on the host, and didn't respond in a timely fashion to the shutdown request. That can happen because of other processes running on the machine and CPU tasks normally run at very low priority, and so the worker doesn't get a chance to respond. In these cases we resort back to stock BoincApi's brutality (Since we did give enough time, and any IO completion routines should have finished, since the OS handles those using a different priority mechanism to avoid losing data. It's a marginally better scenario than stock BoincApi's behaviour if this happens)

3) 6.03 inconclusives. stock 6.03 is slightly less accurate with all signal types than both optimised CPU apps (AKv8b2), and Cuda apps (stock and opt). AKv8 descendants, and GPU apps do not have the same algorithmic precision maintenance (noise) issues in the signal normalisation areas, and stock will be rectified in V7 multibeam. This small amount of noise will mostly only matter with signals around threshold, so can result in different signal counts due to the use of absolute thresholds with no hysteresis.
[Edit:] 3a): corollary to #3. Older GPU apps (inc. stock 6.08,6.09 & 6.10) have an inaccurate chirp, which may also affect all signal types, variation also being most noticeable around threshold, via signal counts. The hardware themselves also tend to be sensitive to (lack of) maintenance issues, and over-exuberant overclocking practices. For the most part the validation mechanism tends to eliminate the most problematic results here, though there are rogue hosts about trashing many tasks.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Message boards : Number crunching : Strange Message

Copyright © 2014 University of California