Strange Message

Message boards : Number crunching : Strange Message
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 1242099 - Posted: 6 Jun 2012, 8:19:59 UTC

Whilst doing wu = 1001902298 it took over 16 hrs to do and when I check after it reported it shut itself down and restarted itself wonder why it did that any ideas?
ID: 1242099 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1242109 - Posted: 6 Jun 2012, 10:34:37 UTC

What was the "Strange Message" you referred to?
ID: 1242109 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 1242121 - Posted: 6 Jun 2012, 11:24:33 UTC

The message was this; requesting safe worker shutdown -> it was the first time that this has happenned happenned to be away when it occurred
ID: 1242121 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1242150 - Posted: 6 Jun 2012, 13:11:13 UTC

task link

That's just to tell you that boinc told the app to exit.
what's more interesting is that you have a restart without a previous exit.
CPU time looks normal compared with your other tasks.
The runtime doesn't - something else hogged the CPU in that time or the task got stuck somehow.

You could check the log file to see if boinc noted anything interesting while taht task ran.

boinc runs benchmarks now and again, this may have cause one restart - else if you are running more than one project it might have been swapped out.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1242150 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 1242305 - Posted: 6 Jun 2012, 16:22:34 UTC

Check the log file nothing so it must have got stuck as my other machine worked well
ID: 1242305 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 1242785 - Posted: 7 Jun 2012, 14:46:02 UTC
Last modified: 7 Jun 2012, 14:48:55 UTC

Just found several inconclusives on mine & on checking the wu...find that I got the same message.

2463078999

This one has an extra message in it..

2463079004
ID: 1242785 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1242799 - Posted: 7 Jun 2012, 15:06:40 UTC

It's just the standard exit message. [for Jason's boinc API anyway]
He could probably tell you what that 'exiting anyway' is all about.

The tasks have gone inconclusive because of differences in the spike count.
That's almost always to do with differences in rounding error between CPU and GPU apps and spikes just at the detection threshold. Joe could tell you all about that one :)

Mostly the results are similar enough to validate in the end.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1242799 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 1242809 - Posted: 7 Jun 2012, 15:26:28 UTC - in response to Message 1242799.  

Thanks LadyL. fwiw, all my rigs are cpu crunchers only.
ID: 1242809 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1242810 - Posted: 7 Jun 2012, 15:27:24 UTC - in response to Message 1242799.  

I think

boinc_exit(): worker didn't respond to exit request within 2 seconds, exiting anyway.

may be being counted as an extra message.
ID: 1242810 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1242816 - Posted: 7 Jun 2012, 15:42:04 UTC
Last modified: 7 Jun 2012, 16:13:16 UTC

3 things in one post :)

1) Standard Boinc Api uses a rather brutal TerminateProcess() windows api call to stop the program when done (or for other reasons). This is not recommended in Microsoft best practices, as multithreaded OSes have all sorts of Buffers & shared/global DLL things going on for delayed action, so my modified BoincApi uses a request/acknowledge shut-down protocol to try to be more graceful about the shutdown.

2) 'Exiting anyway'. The worker thread was otherwise occupied, or preempted by some other higher priority application running on the host, and didn't respond in a timely fashion to the shutdown request. That can happen because of other processes running on the machine and CPU tasks normally run at very low priority, and so the worker doesn't get a chance to respond. In these cases we resort back to stock BoincApi's brutality (Since we did give enough time, and any IO completion routines should have finished, since the OS handles those using a different priority mechanism to avoid losing data. It's a marginally better scenario than stock BoincApi's behaviour if this happens)

3) 6.03 inconclusives. stock 6.03 is slightly less accurate with all signal types than both optimised CPU apps (AKv8b2), and Cuda apps (stock and opt). AKv8 descendants, and GPU apps do not have the same algorithmic precision maintenance (noise) issues in the signal normalisation areas, and stock will be rectified in V7 multibeam. This small amount of noise will mostly only matter with signals around threshold, so can result in different signal counts due to the use of absolute thresholds with no hysteresis.
[Edit:] 3a): corollary to #3. Older GPU apps (inc. stock 6.08,6.09 & 6.10) have an inaccurate chirp, which may also affect all signal types, variation also being most noticeable around threshold, via signal counts. The hardware themselves also tend to be sensitive to (lack of) maintenance issues, and over-exuberant overclocking practices. For the most part the validation mechanism tends to eliminate the most problematic results here, though there are rogue hosts about trashing many tasks.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1242816 · Report as offensive

Message boards : Number crunching : Strange Message


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.