no finished file

Message boards : Number crunching : no finished file
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1519
Credit: 86,815,638
RAC: 0
United Kingdom
Message 1612702 - Posted: 12 Dec 2014, 9:25:53 UTC

I've had a few of these today, they seem to use next to no GPU resources, and be almost completely CPU work, even though they are scheduled for the GPU.

Is this a symptom of a dying GPU, or a faulty WU?



254 SETI@home 12/12/2014 09:05:13 Task ap_08jn14ab_B2_P1_00001_20141210_07383.wu_0 exited with zero status but no 'finished' file
Happy Crunching,

Graham

ID: 1612702 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1612727 - Posted: 12 Dec 2014, 10:16:16 UTC

The "Result exited with zero status but no 'finished' file" error can mean that the science application is unable to find the last checkpoint that it wrote. For some reason, the latest checkpoint wasn't written to disk, perhaps due to a corruption of the task, the disk or because the directory was locked (possibly due to an anti virus scan or anti spyware scan).


With each crime and every kindness we birth our future.
ID: 1612727 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1612729 - Posted: 12 Dec 2014, 10:18:22 UTC - in response to Message 1612702.  

...Is this a symptom of a dying GPU, or a faulty WU?...exited with zero status but no 'finished' file


In short neither. It's a symptom of (the BOINC client) using fixed timeouts in a multithreaded (Modern OS) environment. If it's a substantial change from your host's familiar behaviour, I would consider examining what device drivers changed recently, and system DPC latency behaviour.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1612729 · Report as offensive
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1519
Credit: 86,815,638
RAC: 0
United Kingdom
Message 1612908 - Posted: 12 Dec 2014, 19:03:17 UTC

A little further info...

It seems to be every AP WU that runs on my GTX580. The work on the GTX570 all goes through just fine, AP and MB.

I've just suspended APs until the current one finishes and I'll see what happens with the MBs on that GPU.

Also, all the AP WUs that run on the GTX580 seems to be running almost completely in CPU, with very little GPU usage, about 85%+ CPU time, about 12% - 17% CPU on the GTX570!

Hummm
Happy Crunching,

Graham

ID: 1612908 · Report as offensive
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1519
Credit: 86,815,638
RAC: 0
United Kingdom
Message 1612910 - Posted: 12 Dec 2014, 19:06:22 UTC

More info, it's not every WU on that GPU, it seems to be the SAME WU running on that GPU over and over again - it errors. so it restarts and has the same symptoms!

Go I abort that WU or what?
Happy Crunching,

Graham

ID: 1612910 · Report as offensive
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1519
Credit: 86,815,638
RAC: 0
United Kingdom
Message 1612939 - Posted: 12 Dec 2014, 20:09:48 UTC
Last modified: 12 Dec 2014, 20:10:08 UTC

Having watched go round again, I've now aborted it

All GPUs working OK on the last 2 AP WUs and the MB WUs.

Thanks for the comments.
Happy Crunching,

Graham

ID: 1612939 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1613025 - Posted: 12 Dec 2014, 22:36:09 UTC

From stderr of that aborted task:

ERROR: some exception inside long FFA, probably video-driver restart, restarting app...

I would recommend to decrease FFA block to avoid such situations in the future.
ID: 1613025 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1613102 - Posted: 13 Dec 2014, 1:51:16 UTC - in response to Message 1613025.  

From stderr of that aborted task:

ERROR: some exception inside long FFA, probably video-driver restart, restarting app...

I would recommend to decrease FFA block to avoid such situations in the future.

I would recommend using the Lunatics 0.43a installer to get the version of AP7 which automatically decreases FFA block when the WU data needs it.
                                                                   Joe
ID: 1613102 · Report as offensive

Message boards : Number crunching : no finished file


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.