GTX 970 problem

Questions and Answers : GPU applications : GTX 970 problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Dave Lampkins

Send message
Joined: 19 Jan 00
Posts: 3
Credit: 27,683
RAC: 0
United States
Message 1674357 - Posted: 6 May 2015, 11:48:59 UTC

I am getting a message that says.

Postponed:Cuda runtime,memory related failure,threadsafe temporary Exit

I have no idea what to do,this is a new card,about 4 weeks old.My driver is
Ver 347.88 release date 3/17/15

Thanks
Dave
ID: 1674357 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1674367 - Posted: 6 May 2015, 13:23:44 UTC - in response to Message 1674357.  

couple of questions

Are you overclocking?

Are you running more than 1 work units on it, if so, how many work units are you running?

Have you tried exiting BOINC and restarting?

Have you tried rebooting your computer?

Brent ran into the same issue with his 750Ti, Here is Jason's response to question

http://setiathome.berkeley.edu/forum_thread.php?id=77226&postid=1671799
ID: 1674367 · Report as offensive
Profile Dave Lampkins

Send message
Joined: 19 Jan 00
Posts: 3
Credit: 27,683
RAC: 0
United States
Message 1674444 - Posted: 6 May 2015, 19:08:50 UTC - in response to Message 1674367.  

I have 8 running at a time,not sure why.My card says it's super clocked but I havn't over clocked anything.I did try to exit the program and restart it,no luck.I havn't tried to reboot yet.
ID: 1674444 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1674467 - Posted: 6 May 2015, 20:34:26 UTC

"Running 8 at at a time" - I assume that is 8 CPU tasks, as you would certainly know how you got to 8 GPU tasks at a time (setting up configuration files)

Overclocked to 1366Mhz (default is 1050/1178 boost)
Large number of errors from the GTX970 in the last few hours, so its probably running too hot and protesting like crazy.

Drop the clocks to the Nvidia defaults and see if the situation improves.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1674467 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1674506 - Posted: 6 May 2015, 22:37:57 UTC - in response to Message 1674467.  

I'd also drop the number of instances (work units per card down to 3)

The 970 don't have a full 4 GB of Ram only 3.5

That causes instabilities in crunching on those. The best and safest number we found is 3.

4 can be done but occasionally it will crash again.

No overclocking of the 970 for Seti. It's fine for gaming but not for crunching.

Zalster
ID: 1674506 · Report as offensive
Profile Dave Lampkins

Send message
Joined: 19 Jan 00
Posts: 3
Credit: 27,683
RAC: 0
United States
Message 1674650 - Posted: 7 May 2015, 11:09:51 UTC - in response to Message 1674506.  
Last modified: 7 May 2015, 11:15:16 UTC

How would i drop the instances to 3?
I have never really setup anything in the past for seti,all my other cards have always worked good with no worries.As for the overclocking i think i can change that.Unless going to 3 instances will take care of it with no errors.
Thanks for the help
Dave
ID: 1674650 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1674737 - Posted: 7 May 2015, 16:29:51 UTC

Change the instances of 0.125 to 0.33 in your app_config.xml file

Before doing that however make sure you are running 8ight tasks on your GPU, and not 8 CPU task - check BOINC Mangers, tasks tab, sorted by progess and count the number of running tasks status - those which are shown as "running" are running on the CPU, those that are shown as "Running (x%CPU + y%NVIDIA)" are running on your GPU.
By default I would expect to see 8 on your CPU and one on your GPU
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1674737 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1674999 - Posted: 8 May 2015, 2:42:40 UTC - in response to Message 1674650.  

How would i drop the instances to 3?
I have never really setup anything in the past for seti, ...

Then don't worry about that - you very probably run only one instance on the GPU as is the default.

Check (Windows Task Manager, Process Explorer) how many SETI@home processes for CUDA run.
They have names like:
setiathome_7.00_windows_intelx86__cuda50.exe
setiathome_7.00_windows_intelx86__cuda42.exe


Error is:
http://setiathome.berkeley.edu/results.php?hostid=7379623&offset=0&show_names=0&state=6&appid=
http://setiathome.berkeley.edu/result.php?resultid=4131749131

Thread call stack limit is: 1k
uncaptured error before launch (find_pulse_kernel2<fft_n, numthreads/fft_n, 5, true><<<grid, block>>>(best_pulse_score, PulsePoTLen, AdvanceBy, y_offset, numdivs, firstP, lastP)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_pulsefind.cu, line 1505: unknown error
Exiting

... which causes "too many boinc_temporary_exit()s"


Post in Number Crunching with link to this thread for more Advanced answers
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1674999 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1675025 - Posted: 8 May 2015, 3:30:05 UTC - in response to Message 1674999.  

When he said 8 work units, I thought he said he was trying to run 8 at a time on the GPU.

But if no modification have been done then, he probably is running 7-8 on the CPU and 1 on the card.

too many exits..

Means that the work unit is not progressing and boinc is terminating the work unit after the 99th attempt to progress.

1) Is there any other activity going on with the computer at the same time? Ie, streaming, working on it, antivirus(make sure your anti-virus doesn't scan the BOINC folder)?

2)Did you do a clean install of the driver for the GPU when you put it in?

3)Have you tried turning off the computer and turning it back on?

4)Try removing the GPU and reseating it and see if that helps with the problem.

Just a couple of things to try first. Lets see if some of the others have some other ideas. Sorry for the delay. Had a few problems of my own I was working on, lol

Zalster
ID: 1675025 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1675030 - Posted: 8 May 2015, 4:16:14 UTC

I would guess that you are overclocking the card, I think that is why I got the same error.

I also removed my command line for a more aggressive processing at the same time.

And you shouldn't be running 8 GPU tasks at a time, I don't even think a GTX980 would be happy with that. Go back to 3 tasks.
ID: 1675030 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1675104 - Posted: 8 May 2015, 9:27:28 UTC

The consistency of the issue here seems suspicious, in that the usual power and temperature first suspects seem unlikely.

After clean driver install and reboot, I would check the temperatures, PSU and clocks again anyway, and run an artefact scanner. The factory clocks/boost on GPU core or VRAM (SC edition was mentioned) may be just a little high, and require a small voltage bump for reliability. The failing portion of code is indeed VRAM access intensive, so consistent failure there could indicate one or more memory chips running on the hairy edge, which a small clock backoff or voltage increase should address easily.

These things are sold for gaming, and competition in the mid-high range is fierce, so sometimes the manufacturers are erring on the side of performance over reliability when setting default clocks and voltages.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1675104 · Report as offensive

Questions and Answers : GPU applications : GTX 970 problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.