lunatics GPU x38g + 296.10 Driver

Author	Message
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1234593 - Posted: 21 May 2012, 16:52:26 UTC Hello, It seems that upgrading the driver NVIDIA CUDA driver from 285.62 Windows update proposed update to NVIDIA Drivers. the version was 290.x The windows Vista causes the Lunatics_x38g_win32_uda32.exe to produce unreliable results the amounts of spikes are in excess of 30. The system has 2 NVIDIA 560TI GPUs. Any ideas? I then later upgraded to 296.10. Maybe the issue will be sorted out? Should I rollback to the 285.62? http://setiathome.berkeley.edu/results.php?hostid=3299266&offset=0&show_names=0&state=3&appid= ID: 1234593 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1234598 - Posted: 21 May 2012, 17:08:43 UTC - in response to Message 1234593. The main issue I'm aware of with 29x.x is that they stop the GPU when the computer puts the monitor to sleep. The easiest way to fix it is to change your Windows settings so it never goes to sleep (turn it off manually when you want to). I believe the problem has also been fixed in a newer driver, or you can roll back as you suggested. As someone said in this forum recently, never let Windows Update update your hardware drivers. If Windows says there's an update for hardware, tell it no, then check the hardware manufacturer's web site and get the new driver from there, if you even need it (it may have been released just to fix a problem you don't have anyway). Check for info and, especially, bug reports about the new driver in variouos places too. I'm not sure, however, if this driver bug is related to the excessive spikes you got. It could be a coincidence. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1234598 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1235233 - Posted: 23 May 2012, 1:31:37 UTC - in response to Message 1234593. So you update the NVIDIA Drivers but not the Lunatics app? ;) http://lunatics.kwsn.net/index.php?module=Downloads;catd=9 http://lunatics.kwsn.net/downloads/Lunatics%20ReadMev0.40.txt The current app (inside/included in the Lunatics Installers) is: Lunatics_x41g_win32_cuda32.exe Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1235233 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1235312 - Posted: 23 May 2012, 5:42:56 UTC Ive just updated the driver to even later version 301.42. Let's see how it works out. Now it seems that other hosts are also experiencing same kind of problems with high spike count. You know the calculation failed when spike count is 30. http://setiathome.berkeley.edu/workunit.php?wuid=995048708 http://setiathome.berkeley.edu/result.php?resultid=2449685357 Maybe the lunatics client should be cross verified with also latest -3 WHQL drivers so you would know in case there are any incompatibility or reliability issues. Anyway to recalculate already the same Work unit with different driver versions? I did not find such a feature in BOINC. ID: 1235312 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1235377 - Posted: 23 May 2012, 10:47:55 UTC Last modified: 23 May 2012, 10:56:13 UTC Do upgrade to x41g as well please. Installer And do a clean install of the 301.42 driver, if you didn't do that. Unless you get invalids again, probably the driver to blame. First mention where somebody started to get that type of problem with upgrade to that driver bracket, I think. I'm not the Pope. I don't speak Ex Cathedra! ID: 1235377 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1236066 - Posted: 24 May 2012, 19:58:27 UTC - in response to Message 1235377. I realized that my main GPU in slot 0 is working with a core clock of only 405MHZ. The Asus Factory clock runs at 900MHz. Maybe this could cause half of the results to be invalid. How to fix this problem? I use the latest drivers 301.42. My additional identical 560Ti card in slot 1 (with is only 4x PCIe) is running at 900MHz and it may produce valid results. It also runs some 10 degrees cooler due to being lower in the tower case. The 405 seems to be some power save mode. Also as a consequence the GPU load is peaking due to lower speed. The memory clock runs at 1050 and shader at 1800 Mhz. Any ideas? ID: 1236066 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1236070 - Posted: 24 May 2012, 20:01:21 UTC - in response to Message 1236066. The GPU Core clock is jammed at 405 MHz. I tested it just with halting the calculation on GPUs. This must be a bug in the drivers. ID: 1236070 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1236139 - Posted: 24 May 2012, 22:42:06 UTC - in response to Message 1236070. The GPU Core clock is jammed at 405 MHz. I tested it just with halting the calculation on GPUs. This must be a bug in the drivers. The downclock is not a driver failure but a hardware coded protection, and the only way to get it back to the default speed is trough a reboot of the system. There are several things that can make a GPU to downclock, a hardware failure, a GPU running too hot, an error on the software, a bad or insuffient PSU, a failure on the PCIe slot, etc. but also, sometimes, it just happens with no apparent cause... If that GPU is also throwing a lot of invalid tasks its more probable that it's failling or that the PSU is not beeing able to handle both GPUS... (but not necesarily has to be that). In the particular case of the 560TIs, most of them come with a factory core voltage that is under nominal value to make them more "green", but that value is not allways enough for keeping them crunching 24/7... ID: 1236139 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1236273 - Posted: 25 May 2012, 5:04:53 UTC - in response to Message 1236139. Hello, Actually checked that it is the 560TI connected to the additional PCIe 4x slot. I checked that all the failed results were done on the 2nd unit systematcially. The chipset is intel P35. I would not suspect the PSU it is a 650 W Corsair with dedicated connectors for 2 GPU cards. Maybe NVidia had some issue with the older chipset compatibility for PCIe? The 2nd GPU is now runnning att correct speed 900Mhz after reboot. Stderr output <core_client_version>6.10.60</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 560 Ti, 1023 MiB, regsPerBlock 32768 computeCap 2.1, multiProcs 8 clockRate = 1800000 Device 2: GeForce GTX 560 Ti, 1023 MiB, regsPerBlock 32768 computeCap 2.1, multiProcs 8 clockRate = 1800000 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 560 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 560 Ti Priority of process raised successfully Priority of worker thread raised successfully Cuda Active: Plenty of total Global VRAM (>300MiB). All early cuFft plans postponed, to parallel with first chirp. ) _ _ _)_ o _ _ (__ (_( ) ) (_( (_ ( (_ ( not bad for a human... _) Multibeam x38g Preview, Cuda 3.20 Legacy setiathome_enhanced V6 mode. Work Unit Info: ............... WU true angle range is : 0.399012 Cuda sync'd & freed. Preemptively acknowledging a safe Exit on error-> SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated. Flopcounter: 14445619385601.471000 Spike count: 30 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Worker preemptively acknowledging an overflow exit.-> called boinc_finish boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> </stderr_txt> ]]> http://setiathome.berkeley.edu/results.php?hostid=3299266&offset=0&show_names=0&state=4&appid= ID: 1236273 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1236301 - Posted: 25 May 2012, 6:03:40 UTC With only a 650 watt PSU that would be my first culprit, I would want at least 800 watts with 2 560ti cards. ID: 1236301 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1236450 - Posted: 25 May 2012, 9:07:33 UTC - in response to Message 1236301. With only a 650 watt PSU that would be my first culprit, I would want at least 800 watts with 2 560ti cards. Don't know exactly how much 2 GTX560Ti are drawing, but I still use a 650Watt PSU for my QX9650 (@3.55GHz)+ 1 GTX480, which draws around 415Watts, according to the Kill-a-Watt. The PSU doesn't even get warm, 35C. ID: 1236450 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1236521 - Posted: 25 May 2012, 14:12:57 UTC - in response to Message 1236450. Hello, I am using UPM PM300 power meter. The power usage is between 405W and 425W. The power supply is not an issue. Majority of WUs with the 560Ti that had the speed halted on 405MHz were not correct. After reboot the situation seems better. Now there is a 3 day delay to wait for the same WU to be verified by other hosts. The issue was definately the halt on 405MHz speed. The root cause for that you can only speculate. The history is that original driver update was triggered by Windows Update and then a few driver update from Nvidia. No reboot was done between. I recall previously having some problems with those Windows update originated driver updates for Nvidia. ID: 1236521 ·

Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1236584 - Posted: 25 May 2012, 16:15:57 UTC - in response to Message 1236521. Hooahmentah. Have you upgraded your Lunatics installer? 560ti's are notoriously buggy on SETI but the upgrading the Lunatics app should solve this. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1236584 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1236681 - Posted: 25 May 2012, 19:27:06 UTC - in response to Message 1236584. I want to see first how the system behaves and stabilizes before upgrading from x38 to x41. The thing is that you dont get the results so fast. Some benchmarking tools to verify performance between different clients would be nice. Also the ability to recalculate the same WU would be beneficial too. ID: 1236681 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1236685 - Posted: 25 May 2012, 19:33:41 UTC - in response to Message 1236681. Last modified: 25 May 2012, 19:46:16 UTC Some benchmarking tools to verify performance between different clients would be nice. Also the ability to recalculate the same WU would be beneficial too. Benchmarking programs and shortened test Wu's are available in the Lunatics downloads: Test and Benchmark Tools Claggy ID: 1236685 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1238220 - Posted: 27 May 2012, 20:04:48 UTC - in response to Message 1236139. The 560TI got downclocked again to 405MHz.... Strange. Temperature is around 60 degrees Celsius so not too much. Could it be that the drivers cannot handle the 4x PCIe bus on the Intel P35 chipset? ID: 1238220 ·

Misfit Volunteer tester Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0	Message 1238382 - Posted: 28 May 2012, 4:59:49 UTC - in response to Message 1238220. The 560TI got downclocked again to 405MHz.... Keep GPU-Z running in an active window and that should keep the card from doing that. me@rescam.org ID: 1238382 ·

Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0	Message 1238491 - Posted: 28 May 2012, 15:28:56 UTC - in response to Message 1238220. The 560TI got downclocked again to 405MHz.... Strange. Temperature is around 60 degrees Celsius so not too much. Could it be that the drivers cannot handle the 4x PCIe bus on the Intel P35 chipset? Look in the sensors tab of GPU-Z what is the VDDC (core voltage) of your cards. If it is under 1.05 then you will need to rise that values (using Afterburner or some other overclocking utility), mines were downcloking very often until Ive rised their voltages to 1.062. About the PSU, the total Watts it's not the only thing to look at. You need to know how much of that Watts can go through the 12V line. Each 560TI needs 170Watts maximun, which means your PSU should be able to give at least 340Watts in the 12V line (or 29 Ampers), or better a bit more as the 12V also gives power to HDDs and other devices. And that's if the PSU has only one rail, if it has more than one rail, you need to be sure that you conected the GPUs on different rails and that each rail is able to give the necessary power to each card... ID: 1238491 ·

Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173	Message 1239456 - Posted: 1 Jun 2012, 14:48:50 UTC - in response to Message 1238491. Yep Asus has designed the Voltage to 1.025V. I have not now seen downclocking for a while, but the correct result yield is far from perfect. http://setiathome.berkeley.edu/results.php?hostid=3299266&offset=0&show_names=0&state=4&appid= ID: 1239456 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1239470 - Posted: 1 Jun 2012, 15:21:15 UTC - in response to Message 1239456. If you don't want to increase the Voltage then try to decrease the MHz (e.g. to stock NVIDIA values or lower) The errors (false signals) and downclock are symptoms of bad Voltage/MHz combination. And/or update the Lunatics app (don't wait "to see first how the system behaves and stabilizes before upgrading from x38 to x41") (Lunatics_x41g_win32_cuda32.exe was released long enough for all of us here to know it is stable. Do you have evidence of contrary? And I'm not of the kind that hurry to update everything, I stay at old versions of many programs) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1239470 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.