GPU Tasks Start, Stop, Restart, Stop, Restart, yada yada


log in

Advanced search

Message boards : Number crunching : GPU Tasks Start, Stop, Restart, Stop, Restart, yada yada

Author Message
Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 32,653,516
RAC: 11,735
United States
Message 1214862 - Posted: 7 Apr 2012, 1:00:20 UTC
Last modified: 7 Apr 2012, 1:01:28 UTC

Greetings all.

I'm encountering a new problem on my primary rig that has my RAC in a tailspin. Running the enhanced apps thanks to the Lunatics installer.

Here's what I'm getting:

4/5/2012 11:04:53 PM SETI@home Computation for task 26my11aa.17069.7020.14.10.176_0 finished
4/5/2012 11:04:53 PM SETI@home Starting 26my11aa.17069.7020.14.10.174_1
4/5/2012 11:04:53 PM SETI@home Starting task 26my11aa.17069.7020.14.10.174_1 using setiathome_enhanced version 610
4/5/2012 11:04:55 PM SETI@home Started upload of 26my11aa.17069.7020.14.10.176_0_0
4/5/2012 11:05:00 PM SETI@home Finished upload of 26my11aa.17069.7020.14.10.176_0_0
4/5/2012 11:07:09 PM SETI@home Sending scheduler request: To fetch work.
4/5/2012 11:07:09 PM SETI@home Reporting 1 completed tasks, requesting new tasks for CPU
4/5/2012 11:07:15 PM SETI@home Scheduler request completed: got 0 new tasks
4/5/2012 11:07:15 PM SETI@home Message from server: No tasks sent
4/5/2012 11:07:15 PM SETI@home Message from server: This computer has reached a limit on tasks in progress
4/5/2012 11:10:48 PM SETI@home Computation for task 26my11aa.17069.7020.14.10.174_1 finished
4/5/2012 11:10:48 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:10:50 PM SETI@home Started upload of 26my11aa.17069.7020.14.10.174_1_0
4/5/2012 11:10:55 PM SETI@home Finished upload of 26my11aa.17069.7020.14.10.174_1_0
4/5/2012 11:11:18 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:11:47 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:12:17 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:12:23 PM SETI@home Sending scheduler request: To fetch work.
4/5/2012 11:12:23 PM SETI@home Reporting 1 completed tasks, requesting new tasks for CPU
4/5/2012 11:12:29 PM SETI@home Scheduler request completed: got 0 new tasks
4/5/2012 11:12:29 PM SETI@home Message from server: No tasks sent
4/5/2012 11:12:29 PM SETI@home Message from server: This computer has reached a limit on tasks in progress
4/5/2012 11:12:47 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:13:16 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.242_1 using setiathome_enhanced version 610
4/5/2012 11:13:45 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.229_0 using setiathome_enhanced version 610
4/5/2012 11:14:15 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:14:44 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:15:13 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:15:42 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:16:11 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:16:40 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:17:10 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.242_1 using setiathome_enhanced version 610
4/5/2012 11:17:38 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.229_0 using setiathome_enhanced version 610
4/5/2012 11:18:08 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:18:37 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:19:06 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:19:35 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:20:05 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:20:34 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:21:03 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.242_1 using setiathome_enhanced version 610
4/5/2012 11:21:33 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.229_0 using setiathome_enhanced version 610
4/5/2012 11:22:03 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:22:33 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:23:03 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:23:32 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:24:01 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:24:31 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:24:33 PM SETI@home Sending scheduler request: To fetch work.
4/5/2012 11:24:33 PM SETI@home Requesting new tasks for CPU
4/5/2012 11:24:38 PM SETI@home Scheduler request completed: got 0 new tasks
4/5/2012 11:24:38 PM SETI@home Message from server: No tasks sent
4/5/2012 11:24:38 PM SETI@home Message from server: This computer has reached a limit on tasks in progress

Any ideas?

Thanks, Easter Bunny!
____________

Join the PACK!

Who Know's
Send message
Joined: 12 Sep 99
Posts: 9
Credit: 9,826,012
RAC: 7,207
United States
Message 1214863 - Posted: 7 Apr 2012, 1:13:56 UTC - in response to Message 1214862.

hi please check this thread.

http://setiathome.berkeley.edu/forum_thread.php?id=67578

Mr.raistmer has updated some code
____________

Fix it till it breaks, then Fix it again so it is Fixed right

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7100
Credit: 95,237,645
RAC: 73,947
Australia
Message 1214883 - Posted: 7 Apr 2012, 2:35:50 UTC - in response to Message 1214863.

I'm not real sure of this but did you updated your video drivers at the same time that this problem started and you let Windows put your monitor to sleep then that's your problem. To fix set Windows to never put your monitor to sleep (this is a well known problem with the latest nvidia drivers).

Cheers.
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4252
Credit: 1,050,380
RAC: 249
United States
Message 1214887 - Posted: 7 Apr 2012, 2:45:05 UTC - in response to Message 1214862.

Greetings all.

I'm encountering a new problem on my primary rig that has my RAC in a tailspin. Running the enhanced apps thanks to the Lunatics installer.

Here's what I'm getting:

...
4/5/2012 11:18:08 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:18:37 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:19:06 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:19:35 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:20:05 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
...
Any ideas?

Thanks, Easter Bunny!

When something keeps the x41g CUDA app from being able to run, it uses a boinc_temporary_exit(180) and BOINC then restarts the task 3 minutes later. For your Task 2351703216 for instance, there were many cycles like:

In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs)
Preemptively Acknowledging temporary exit -> boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->

or

Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 1 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 2 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 3 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 4 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 5 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs)
Preemptively Acknowledging temporary exit -> boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->


But eventually the application was able to complete the task successfully, and it has been granted credit.

I don't know what is causing the problem, can't even make any kind of guess. But considering that particular task doesn't have the complete stderr.txt section because the BOINC core client only sends the last 64KB, I'm not at all surprised your RAC has suffered.
Joe

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 3
United States
Message 1214890 - Posted: 7 Apr 2012, 2:51:47 UTC

I was having that problem on "Badger". The problem was that the .40 file was taking more memory than the .38 version, and as a result I was running out of GPU memory.

Changing from 3 tasks to 2 also cleared the problem.

(changed .3 to .5 in the app_info)
____________

Janice

Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 32,653,516
RAC: 11,735
United States
Message 1215178 - Posted: 7 Apr 2012, 17:30:40 UTC - in response to Message 1214883.

I'm not real sure of this but did you updated your video drivers at the same time that this problem started and you let Windows put your monitor to sleep then that's your problem. To fix set Windows to never put your monitor to sleep (this is a well known problem with the latest nvidia drivers).

Cheers.


Thanks, Wiggo. You got it! All waiting GPU tasks have cleared. Let's hope the RAC has bottomed out.

Thanks again.
____________

Join the PACK!

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7100
Credit: 95,237,645
RAC: 73,947
Australia
Message 1215399 - Posted: 8 Apr 2012, 0:19:10 UTC - in response to Message 1215178.

I'm not real sure of this but did you updated your video drivers at the same time that this problem started and you let Windows put your monitor to sleep then that's your problem. To fix set Windows to never put your monitor to sleep (this is a well known problem with the latest nvidia drivers).

Cheers.


Thanks, Wiggo. You got it! All waiting GPU tasks have cleared. Let's hope the RAC has bottomed out.

Thanks again.

Only to happy to help out.

Cheers.
____________

Message boards : Number crunching : GPU Tasks Start, Stop, Restart, Stop, Restart, yada yada

Copyright © 2014 University of California