Some GPU workunits cause driver reset

Message boards : Number crunching : Some GPU workunits cause driver reset
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1654109 - Posted: 18 Mar 2015, 13:33:33 UTC - in response to Message 1654093.  

Sounds like the only thing left is to clean the video driver. Drivers are weird, I have a Ubuntu machine that had been working fine for about a year then started freezing the screen every couple of days. I finally gave up and changed the driver and now it seems to be working fine again. For Windows the best cleaner is Display Driver Uninstaller, http://www.guru3d.com/files-details/display-driver-uninstaller-download.html. DDU will clean every part of the driver as if a driver had never been installed. If it still gives the same problem after reinstalling 14.4 try running the DDU again and installing 13.12.
ID: 1654109 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1654828 - Posted: 20 Mar 2015, 7:25:30 UTC - in response to Message 1654093.  

Seti@home WU crashed as per my thread title.

1) "WU" is data file, so you probably want to say "Seti@home GPU app crashed"?
2) "as per thread title": "Some GPU workunits cause driver reset" (which means? ... what exact Message? "Video driver failed and was restarted"?)


As per my OP I have done Memtest, Prime95 and Furmark already.

(I know as I read before posting)
None of which is OpenCL (so don't test the Computing ability of the GPU+driver)
(in my links are some OpenCL tests)

Prime95 is old and do not load CPUs much. For CPU + RAM test better try
LinX (Linpack):
http://www.xtremesystems.org/forums/showthread.php?201670-LinX-A-simple-Linpack-interface
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1654828 · Report as offensive
DayneC

Send message
Joined: 12 Dec 14
Posts: 8
Credit: 75,443
RAC: 0
Message 1654860 - Posted: 20 Mar 2015, 8:58:06 UTC
Last modified: 20 Mar 2015, 9:04:29 UTC

I guess it's proper to say the app crashed, but it seems like it's either only some WU where it happens or that it just happens after some time. It still says the task is running in BOINC manager but the progress doesn't increase. The next time it happens I will check whether or not the app is still actually open on the machine, not sure if that matters.

If I see a message I'm pretty sure it's “Display driver stopped responding and has recovered”. It often happens when I'm away from the machine, and often if I try to restart the task (by suspending GPU then unsuspending again) it will immediately cause an initial black screen then some things will become visible again but the system is unresponsive. Mouse cursor can move but you can't actually click on anything that is visible.

Which of the tests you linked do you think I should use, or should I just try working through all of them?
ID: 1654860 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1654870 - Posted: 20 Mar 2015, 9:45:11 UTC - in response to Message 1654860.  

If I see a message I'm pretty sure it's “Display driver stopped responding and has recovered”.

Seems to be TdrDelay default of 2 seconds
Don't disable TdrDelay - just set it to somehow higher value:
http://setiathome.ssl.berkeley.edu/forum_thread.php?id=75391&postid=1561837#1561837

Also you may test (or aim) for less load on GPU by cmdline options/switches:
http://setiathome.berkeley.edu/forum_thread.php?id=76447&postid=1638568#1638568


It often happens when I'm away from the machine, and often if I try to restart the task (by suspending GPU then unsuspending again) it will immediately cause an initial black screen then some things will become visible again but the system is unresponsive. Mouse cursor can move but you can't actually click on anything that is visible.

After “Display driver stopped responding and has recovered” the OpenCL part of the driver is in unstable state (I think I know that from Raistmer (the programmer of the OpenCL apps))
So don't try to do anything OpenCL (like "restart the task") before you do real reboot (restart of Windows)


Which of the tests you linked do you think I should use, or should I just try working through all of them?

Why not try them all? ;)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1654870 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1655161 - Posted: 20 Mar 2015, 22:57:10 UTC

I notice increasing TdrDelay helps with avoiding driver time-outs on APUs, especially since increasing the period_iterations_num parameter doesn't seem to help much.
Soli Deo Gloria
ID: 1655161 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1655308 - Posted: 21 Mar 2015, 7:42:51 UTC

On ATI usually AstroPulse runs much better - many (including me) report much less lag, gain/speedup compared to CPU is bigger
So you may try to run only AstroPulse on ATI

I run only AstroPulse on ATI + only SETI@home v7 on CPU
The lack of AstroPulse tasks recently (during the last month) 'forced' me to do some SETI@home v7 on ATI - every few minutes I was feeling lag/stuttering (2 peaks of ~5 s each) of mouse and ... everything

(The 2 peaks of ~5 s each of no responsiveness are also visible in Process Lasso)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1655308 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1655319 - Posted: 21 Mar 2015, 8:30:11 UTC

Did you try OCLfftplan ?

With r_2760 and later you can tune oclfft planning which helps on slower GPU`s.

This helped on a 5650 i recently installed on a customers computer.

-sbs 192 -spike_fft_tresh 2048 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 8 -cpu_lock -hp


With each crime and every kindness we birth our future.
ID: 1655319 · Report as offensive
DayneC

Send message
Joined: 12 Dec 14
Posts: 8
Credit: 75,443
RAC: 0
Message 1659322 - Posted: 30 Mar 2015, 17:20:16 UTC

I changed to driver version 13.12, 13.9 and now 13.4. I have not had the issue with any of these versions and have been running 13.4 for six days now. I guess I might get round to trying out some of the test programs anyway.

I also get the lagging with a noticeable peak in GPU usage.
ID: 1659322 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Some GPU workunits cause driver reset


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.