GPU Tasks Start, Stop, Restart, Stop, Restart, yada yada

Message boards : Number crunching : GPU Tasks Start, Stop, Restart, Stop, Restart, yada yada
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 1214862 - Posted: 7 Apr 2012, 1:00:20 UTC
Last modified: 7 Apr 2012, 1:01:28 UTC

Greetings all.

I'm encountering a new problem on my primary rig that has my RAC in a tailspin. Running the enhanced apps thanks to the Lunatics installer.

Here's what I'm getting:

4/5/2012 11:04:53 PM SETI@home Computation for task 26my11aa.17069.7020.14.10.176_0 finished
4/5/2012 11:04:53 PM SETI@home Starting 26my11aa.17069.7020.14.10.174_1
4/5/2012 11:04:53 PM SETI@home Starting task 26my11aa.17069.7020.14.10.174_1 using setiathome_enhanced version 610
4/5/2012 11:04:55 PM SETI@home Started upload of 26my11aa.17069.7020.14.10.176_0_0
4/5/2012 11:05:00 PM SETI@home Finished upload of 26my11aa.17069.7020.14.10.176_0_0
4/5/2012 11:07:09 PM SETI@home Sending scheduler request: To fetch work.
4/5/2012 11:07:09 PM SETI@home Reporting 1 completed tasks, requesting new tasks for CPU
4/5/2012 11:07:15 PM SETI@home Scheduler request completed: got 0 new tasks
4/5/2012 11:07:15 PM SETI@home Message from server: No tasks sent
4/5/2012 11:07:15 PM SETI@home Message from server: This computer has reached a limit on tasks in progress
4/5/2012 11:10:48 PM SETI@home Computation for task 26my11aa.17069.7020.14.10.174_1 finished
4/5/2012 11:10:48 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:10:50 PM SETI@home Started upload of 26my11aa.17069.7020.14.10.174_1_0
4/5/2012 11:10:55 PM SETI@home Finished upload of 26my11aa.17069.7020.14.10.174_1_0
4/5/2012 11:11:18 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:11:47 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:12:17 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:12:23 PM SETI@home Sending scheduler request: To fetch work.
4/5/2012 11:12:23 PM SETI@home Reporting 1 completed tasks, requesting new tasks for CPU
4/5/2012 11:12:29 PM SETI@home Scheduler request completed: got 0 new tasks
4/5/2012 11:12:29 PM SETI@home Message from server: No tasks sent
4/5/2012 11:12:29 PM SETI@home Message from server: This computer has reached a limit on tasks in progress
4/5/2012 11:12:47 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:13:16 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.242_1 using setiathome_enhanced version 610
4/5/2012 11:13:45 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.229_0 using setiathome_enhanced version 610
4/5/2012 11:14:15 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:14:44 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:15:13 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:15:42 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:16:11 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:16:40 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:17:10 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.242_1 using setiathome_enhanced version 610
4/5/2012 11:17:38 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.229_0 using setiathome_enhanced version 610
4/5/2012 11:18:08 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:18:37 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:19:06 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:19:35 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:20:05 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:20:34 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:21:03 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.242_1 using setiathome_enhanced version 610
4/5/2012 11:21:33 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.229_0 using setiathome_enhanced version 610
4/5/2012 11:22:03 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:22:33 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:23:03 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:23:32 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:24:01 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
4/5/2012 11:24:31 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.113_0 using setiathome_enhanced version 610
4/5/2012 11:24:33 PM SETI@home Sending scheduler request: To fetch work.
4/5/2012 11:24:33 PM SETI@home Requesting new tasks for CPU
4/5/2012 11:24:38 PM SETI@home Scheduler request completed: got 0 new tasks
4/5/2012 11:24:38 PM SETI@home Message from server: No tasks sent
4/5/2012 11:24:38 PM SETI@home Message from server: This computer has reached a limit on tasks in progress

Any ideas?

Thanks, Easter Bunny!

Join the PACK!
ID: 1214862 · Report as offensive
Who Know's

Send message
Joined: 12 Sep 99
Posts: 9
Credit: 10,923,514
RAC: 0
United States
Message 1214863 - Posted: 7 Apr 2012, 1:13:56 UTC - in response to Message 1214862.  

hi please check this thread.

http://setiathome.berkeley.edu/forum_thread.php?id=67578

Mr.raistmer has updated some code

Fix it till it breaks, then Fix it again so it is Fixed right
ID: 1214863 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1214883 - Posted: 7 Apr 2012, 2:35:50 UTC - in response to Message 1214863.  

I'm not real sure of this but did you updated your video drivers at the same time that this problem started and you let Windows put your monitor to sleep then that's your problem. To fix set Windows to never put your monitor to sleep (this is a well known problem with the latest nvidia drivers).

Cheers.
ID: 1214883 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1214887 - Posted: 7 Apr 2012, 2:45:05 UTC - in response to Message 1214862.  

Greetings all.

I'm encountering a new problem on my primary rig that has my RAC in a tailspin. Running the enhanced apps thanks to the Lunatics installer.

Here's what I'm getting:

...
4/5/2012 11:18:08 PM SETI@home Restarting task 15ja12ac.3614.67.10.10.226_0 using setiathome_enhanced version 610
4/5/2012 11:18:37 PM SETI@home Restarting task 15ja12aa.14818.13564.14.10.49_1 using setiathome_enhanced version 610
4/5/2012 11:19:06 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.95_1 using setiathome_enhanced version 610
4/5/2012 11:19:35 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.92_1 using setiathome_enhanced version 610
4/5/2012 11:20:05 PM SETI@home Restarting task 15ja12aa.14644.13564.13.10.123_1 using setiathome_enhanced version 610
...
Any ideas?

Thanks, Easter Bunny!

When something keeps the x41g CUDA app from being able to run, it uses a boinc_temporary_exit(180) and BOINC then restarts the task 3 minutes later. For your Task 2351703216 for instance, there were many cycles like:

In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs)
Preemptively Acknowledging temporary exit -> boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->

or

Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 1 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 2 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 3 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 4 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 5 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs)
Preemptively Acknowledging temporary exit -> boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->


But eventually the application was able to complete the task successfully, and it has been granted credit.

I don't know what is causing the problem, can't even make any kind of guess. But considering that particular task doesn't have the complete stderr.txt section because the BOINC core client only sends the last 64KB, I'm not at all surprised your RAC has suffered.
                                                                  Joe
ID: 1214887 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1214890 - Posted: 7 Apr 2012, 2:51:47 UTC

I was having that problem on "Badger". The problem was that the .40 file was taking more memory than the .38 version, and as a result I was running out of GPU memory.

Changing from 3 tasks to 2 also cleared the problem.

(changed .3 to .5 in the app_info)
Janice
ID: 1214890 · Report as offensive
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 1215178 - Posted: 7 Apr 2012, 17:30:40 UTC - in response to Message 1214883.  

I'm not real sure of this but did you updated your video drivers at the same time that this problem started and you let Windows put your monitor to sleep then that's your problem. To fix set Windows to never put your monitor to sleep (this is a well known problem with the latest nvidia drivers).

Cheers.


Thanks, Wiggo. You got it! All waiting GPU tasks have cleared. Let's hope the RAC has bottomed out.

Thanks again.

Join the PACK!
ID: 1215178 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1215399 - Posted: 8 Apr 2012, 0:19:10 UTC - in response to Message 1215178.  

I'm not real sure of this but did you updated your video drivers at the same time that this problem started and you let Windows put your monitor to sleep then that's your problem. To fix set Windows to never put your monitor to sleep (this is a well known problem with the latest nvidia drivers).

Cheers.


Thanks, Wiggo. You got it! All waiting GPU tasks have cleared. Let's hope the RAC has bottomed out.

Thanks again.

Only to happy to help out.

Cheers.
ID: 1215399 · Report as offensive

Message boards : Number crunching : GPU Tasks Start, Stop, Restart, Stop, Restart, yada yada


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.