560TI GPU computation errors

Message boards : Number crunching : 560TI GPU computation errors
Message board moderation

To post messages, you must log in.

AuthorMessage
brw5 is alive

Send message
Joined: 11 May 07
Posts: 30
Credit: 10,339,914
RAC: 0
Australia
Message 1169872 - Posted: 10 Nov 2011, 21:47:33 UTC

I am running the optimized apps, which have all been running well up until about a week ago.
Now all the GPU apps fall over after 3 - 4 seconds, the display has been pixelated in areas, and occassionally blanking out. On redisplay a windows error message comes up about the display driver not responding and has been restarted. I had the 260 driver installed when it started happening, and now have the latest NVidia display driver but still the same problem.
The card & machine is only relatively new (a month or so), and no problems up to now (other than being able to get enough WU's to crunch 24/7).
All help appreciated.
Cheers.
ID: 1169872 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1169874 - Posted: 10 Nov 2011, 21:52:18 UTC - in response to Message 1169872.  

You've eliminated the driver. If the video
card is getting proper power then it's time to rma
the video card.
ID: 1169874 · Report as offensive
Philhnnss
Volunteer tester

Send message
Joined: 22 Feb 08
Posts: 63
Credit: 30,694,327
RAC: 162
United States
Message 1169917 - Posted: 11 Nov 2011, 1:15:43 UTC
Last modified: 11 Nov 2011, 1:16:25 UTC

Gigabyte released a new BIOS for the 560TI that raises the voltage.

http://uk.gigabyte.com/products/product-page.aspx?pid=3707&dl=1#bios

I'll bet others follow suit. When I was researching what new cards to
get I came to my own conclusion that Nvidia undervolted the 560's to
make them more "Green". I'll bet they missed the mark as to just how
low they could run them. If your not comfortable raising the voltage
you might try underclocking it???
ID: 1169917 · Report as offensive
brw5 is alive

Send message
Joined: 11 May 07
Posts: 30
Credit: 10,339,914
RAC: 0
Australia
Message 1170020 - Posted: 11 Nov 2011, 9:12:54 UTC - in response to Message 1169874.  

Sorry, but what is rma?

How do I raise the voltage and/or underclock the card?

For the first 4 weeks it's been awesome - crunching through tasks nearly as quick as I can get them - typically in 1 - 2 minutes, but only in the last week or so it's been playing up. Could I have fried the card/chip causing these errors?

At least now I can get some tasks to run, but very locg run times in comparison (5+ hours). Are the WUs getting bigger as SETI adjusts to the GPUs capability, or is my card dying? I am still getting more than 50% compute errors.

Cheers.
ID: 1170020 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1170028 - Posted: 11 Nov 2011, 10:15:42 UTC
Last modified: 11 Nov 2011, 10:21:51 UTC

Please unhide your computers, or it's not possible to check individual tasks to troubleshoot.

Long runtimes can have several causes, but to elimiate some of them, the stderr output of those tasks needs to be analysed.
ID: 1170028 · Report as offensive
brw5 is alive

Send message
Joined: 11 May 07
Posts: 30
Credit: 10,339,914
RAC: 0
Australia
Message 1170032 - Posted: 11 Nov 2011, 10:27:12 UTC - in response to Message 1170028.  

I have unhidden my nachines in SETI about 5 minutes ago.
Is there somewhere else I need to unhide them, or can you see them now?

Btw, what is stderr, and is it something I can interpret, or is it best left to those who know how all this stuff works....?

Cheers.

ID: 1170032 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1170045 - Posted: 11 Nov 2011, 11:00:51 UTC - in response to Message 1170032.  
Last modified: 11 Nov 2011, 11:03:48 UTC

I have unhidden my nachines in SETI about 5 minutes ago.
Is there somewhere else I need to unhide them, or can you see them now?

Btw, what is stderr, and is it something I can interpret, or is it best left to those who know how all this stuff works....?

Cheers.



Unless the database needs some time to update, still hidden.

you need to go to your seti project preferences and make sure the box for 'Should SETI@home show your computers on its web site?' is ticked.

Stderr is the human readable output generated by the application as it processes the task. It contains some information about the task and any error messages.

You can look at it by clicking on the task link in a task list.
Interpretation isn't black magic, but you'll have to build up some experience on what different output signifies.
ID: 1170045 · Report as offensive
brw5 is alive

Send message
Joined: 11 May 07
Posts: 30
Credit: 10,339,914
RAC: 0
Australia
Message 1170051 - Posted: 11 Nov 2011, 11:35:15 UTC - in response to Message 1170045.  

Should SETI@home show your computers on its web site? = YES

Call me stupid, but where exactly do you "look at it by clicking on the task link in a task list." I assume this is on a SETI web page somewhere?

Please stop laughing now..... I know I am stupid!


ID: 1170051 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1170056 - Posted: 11 Nov 2011, 11:50:20 UTC - in response to Message 1170051.  

Should SETI@home show your computers on its web site? = YES

Call me stupid, but where exactly do you "look at it by clicking on the task link in a task list." I assume this is on a SETI web page somewhere?

Please stop laughing now..... I know I am stupid!


Unhiding of computers may be broken. I'll monitor and if necessary report.

Until then giving your host ID number works as well.

I don't call anybody stupid who is trying to learn ;)
Go to your account. Part Computing and Credit entry 'Tasks' click 'view'
This brings up your task list.
links at the top allow sorting.
Clicking on the link for a task number, WU number or hostID number will bring up details about the task, WU, and host respectively.
Clicking on the task gives you, among other information like runtime, validating state and crdit awarded, the stderr output.
Clicking on a wu gives you the tasks associated to that WU - how many, which host, which state etc.
host numbers link to the host details page (from where you can get to the task list again)

NB entries for tasks will be mostly empty until the task has been reported.
ID: 1170056 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1170078 - Posted: 11 Nov 2011, 13:35:35 UTC

Ok, your computers are now showing up.

On a hunch - downgrade the driver to 280.26
I'm seeing errors that so far I have only seen on machines running the 285.x driver.

560Ti may have some problems with factory overclock, but if it was running stable previously, that's probably not it.

If the problems started shortly after you upgraded the driver, that's your most likely cause.

I'd downgrade the driver (preferably clean install) and see if the problem clears.
ID: 1170078 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1170083 - Posted: 11 Nov 2011, 13:47:28 UTC - in response to Message 1170020.  

Sorry, but what is rma?

I think it stands for Return Material Authorisation

In other words the card has failed and needs to be returned to where you bought it for replacement under warranty.

If you bought it on line you need an RMA number to return it to the place you bought it. If you bought over the counter just take it back to the shop.

Does anyone else reading this thread know what a -6 error means ?

T.A.

ID: 1170083 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1170088 - Posted: 11 Nov 2011, 13:51:59 UTC - in response to Message 1170083.  

Does anyone else reading this thread know what a -6 error means ?


I haven't the faintest idea and I couldn't find it in Ageless' BOINC FAQ either.
ID: 1170088 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1170097 - Posted: 11 Nov 2011, 14:22:39 UTC - in response to Message 1170083.  

Sorry, but what is rma?

I think it stands for Return Material Authorisation


It commonly means Return Merchandise Authorization. Most online etailers and manufacturers will not accept returns without an RMA.
ID: 1170097 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1170102 - Posted: 11 Nov 2011, 14:38:22 UTC - in response to Message 1170088.  
Last modified: 11 Nov 2011, 15:00:55 UTC

Does anyone else reading this thread know what a -6 error means ?


I haven't the faintest idea and I couldn't find it in Ageless' BOINC FAQ either.


The -6 error encountered is CUFFT_EXEC_FAILED, sometimes characteristicly appearing in marginal systems. It's really an internal Cuda library failure out of the scope of the Boinc client, boincapi and app too, but indicative of 'something' going on underneath.

There are several things to check/do with these cards in sequence, which is an evolving checklist as some people's apparently problematic 560ti hosts 'come good'.

Systematically, some steps not being easy:
1) Ensure your PSU +12V rail is up to the task, a good 850W psu for 1 560ti seems to be most successful. The 560ti cards are monsters. Compare the 2 machines on my computer list.
2) Reseat the card & its power connections carefully, it's fixed things in a couple of instances with these cards already... maybe the cards have 'funny fingers' ?
3) For factory OC'd models, use a tool like nVidia Inspector, MSI Afterburner or other similar tool to boost the GPU core voltage to 1.065 volts &/or underclock the core/shader & VRAM
4) Ensure System chipset drivers are up to date & Windows updates are applied (there are various things in both these sets of updates specifically related to video memory handling)
5) Unplug all peripherals, such as USB cameras etc for now (for isolation purposes, there can be conflicts in some cases, resolvable but clouding the issues)
6) Check motherboard BIOS is up to date, lock the PCIe BUS to 100MHz, potentially add 0.1V to the chipset voltage & 0.05 v to CPU vcore, observing system RAM & CPU tech spec limits & temperatures. Depending on the motherboard & PSU there can be significant noise on the 12 Volt rails, which ripple can intermittently push values below stable regions (See step #1))
7) Try the x39e diagnostic build. I currently have a build cooking that specifically addresses some of the issues some are coming across with these cards. If it shows to help out in certain cases under scrutiny at the moment, I'll consider accelerating its wider release to reduce problems appearing.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1170102 · Report as offensive

Message boards : Number crunching : 560TI GPU computation errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.