Message boards :
Number crunching :
560TI GPU computation errors
Message board moderation
Author | Message |
---|---|
brw5 is alive Send message Joined: 11 May 07 Posts: 30 Credit: 10,339,914 RAC: 0 |
I am running the optimized apps, which have all been running well up until about a week ago. Now all the GPU apps fall over after 3 - 4 seconds, the display has been pixelated in areas, and occassionally blanking out. On redisplay a windows error message comes up about the display driver not responding and has been restarted. I had the 260 driver installed when it started happening, and now have the latest NVidia display driver but still the same problem. The card & machine is only relatively new (a month or so), and no problems up to now (other than being able to get enough WU's to crunch 24/7). All help appreciated. Cheers. |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
You've eliminated the driver. If the video card is getting proper power then it's time to rma the video card. |
Philhnnss Send message Joined: 22 Feb 08 Posts: 63 Credit: 30,694,327 RAC: 162 |
Gigabyte released a new BIOS for the 560TI that raises the voltage. http://uk.gigabyte.com/products/product-page.aspx?pid=3707&dl=1#bios I'll bet others follow suit. When I was researching what new cards to get I came to my own conclusion that Nvidia undervolted the 560's to make them more "Green". I'll bet they missed the mark as to just how low they could run them. If your not comfortable raising the voltage you might try underclocking it??? |
brw5 is alive Send message Joined: 11 May 07 Posts: 30 Credit: 10,339,914 RAC: 0 |
Sorry, but what is rma? How do I raise the voltage and/or underclock the card? For the first 4 weeks it's been awesome - crunching through tasks nearly as quick as I can get them - typically in 1 - 2 minutes, but only in the last week or so it's been playing up. Could I have fried the card/chip causing these errors? At least now I can get some tasks to run, but very locg run times in comparison (5+ hours). Are the WUs getting bigger as SETI adjusts to the GPUs capability, or is my card dying? I am still getting more than 50% compute errors. Cheers. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Please unhide your computers, or it's not possible to check individual tasks to troubleshoot. Long runtimes can have several causes, but to elimiate some of them, the stderr output of those tasks needs to be analysed. |
brw5 is alive Send message Joined: 11 May 07 Posts: 30 Credit: 10,339,914 RAC: 0 |
I have unhidden my nachines in SETI about 5 minutes ago. Is there somewhere else I need to unhide them, or can you see them now? Btw, what is stderr, and is it something I can interpret, or is it best left to those who know how all this stuff works....? Cheers. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
I have unhidden my nachines in SETI about 5 minutes ago. Unless the database needs some time to update, still hidden. you need to go to your seti project preferences and make sure the box for 'Should SETI@home show your computers on its web site?' is ticked. Stderr is the human readable output generated by the application as it processes the task. It contains some information about the task and any error messages. You can look at it by clicking on the task link in a task list. Interpretation isn't black magic, but you'll have to build up some experience on what different output signifies. |
brw5 is alive Send message Joined: 11 May 07 Posts: 30 Credit: 10,339,914 RAC: 0 |
Should SETI@home show your computers on its web site? = YES Call me stupid, but where exactly do you "look at it by clicking on the task link in a task list." I assume this is on a SETI web page somewhere? Please stop laughing now..... I know I am stupid! |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Should SETI@home show your computers on its web site? = YES Unhiding of computers may be broken. I'll monitor and if necessary report. Until then giving your host ID number works as well. I don't call anybody stupid who is trying to learn ;) Go to your account. Part Computing and Credit entry 'Tasks' click 'view' This brings up your task list. links at the top allow sorting. Clicking on the link for a task number, WU number or hostID number will bring up details about the task, WU, and host respectively. Clicking on the task gives you, among other information like runtime, validating state and crdit awarded, the stderr output. Clicking on a wu gives you the tasks associated to that WU - how many, which host, which state etc. host numbers link to the host details page (from where you can get to the task list again) NB entries for tasks will be mostly empty until the task has been reported. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Ok, your computers are now showing up. On a hunch - downgrade the driver to 280.26 I'm seeing errors that so far I have only seen on machines running the 285.x driver. 560Ti may have some problems with factory overclock, but if it was running stable previously, that's probably not it. If the problems started shortly after you upgraded the driver, that's your most likely cause. I'd downgrade the driver (preferably clean install) and see if the problem clears. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Sorry, but what is rma? I think it stands for Return Material Authorisation In other words the card has failed and needs to be returned to where you bought it for replacement under warranty. If you bought it on line you need an RMA number to return it to the place you bought it. If you bought over the counter just take it back to the shop. Does anyone else reading this thread know what a -6 error means ? T.A. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Does anyone else reading this thread know what a -6 error means ? I haven't the faintest idea and I couldn't find it in Ageless' BOINC FAQ either. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Sorry, but what is rma? It commonly means Return Merchandise Authorization. Most online etailers and manufacturers will not accept returns without an RMA. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Does anyone else reading this thread know what a -6 error means ? The -6 error encountered is CUFFT_EXEC_FAILED, sometimes characteristicly appearing in marginal systems. It's really an internal Cuda library failure out of the scope of the Boinc client, boincapi and app too, but indicative of 'something' going on underneath. There are several things to check/do with these cards in sequence, which is an evolving checklist as some people's apparently problematic 560ti hosts 'come good'. Systematically, some steps not being easy: 1) Ensure your PSU +12V rail is up to the task, a good 850W psu for 1 560ti seems to be most successful. The 560ti cards are monsters. Compare the 2 machines on my computer list. 2) Reseat the card & its power connections carefully, it's fixed things in a couple of instances with these cards already... maybe the cards have 'funny fingers' ? 3) For factory OC'd models, use a tool like nVidia Inspector, MSI Afterburner or other similar tool to boost the GPU core voltage to 1.065 volts &/or underclock the core/shader & VRAM 4) Ensure System chipset drivers are up to date & Windows updates are applied (there are various things in both these sets of updates specifically related to video memory handling) 5) Unplug all peripherals, such as USB cameras etc for now (for isolation purposes, there can be conflicts in some cases, resolvable but clouding the issues) 6) Check motherboard BIOS is up to date, lock the PCIe BUS to 100MHz, potentially add 0.1V to the chipset voltage & 0.05 v to CPU vcore, observing system RAM & CPU tech spec limits & temperatures. Depending on the motherboard & PSU there can be significant noise on the 12 Volt rails, which ripple can intermittently push values below stable regions (See step #1)) 7) Try the x39e diagnostic build. I currently have a build cooking that specifically addresses some of the issues some are coming across with these cards. If it shows to help out in certain cases under scrutiny at the moment, I'll consider accelerating its wider release to reduce problems appearing. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.