lunatics GPU x38g + 296.10 Driver

Message boards : Number crunching : lunatics GPU x38g + 296.10 Driver
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1234593 - Posted: 21 May 2012, 16:52:26 UTC

Hello,

It seems that upgrading the driver NVIDIA CUDA driver from 285.62

Windows update proposed update to NVIDIA Drivers. the version was 290.x The
windows Vista causes the Lunatics_x38g_win32_uda32.exe to produce unreliable results the amounts of spikes are in excess of 30.

The system has 2 NVIDIA 560TI GPUs. Any ideas?

I then later upgraded to 296.10. Maybe the issue will be sorted out? Should I rollback to the 285.62?

http://setiathome.berkeley.edu/results.php?hostid=3299266&offset=0&show_names=0&state=3&appid=



ID: 1234593 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1234598 - Posted: 21 May 2012, 17:08:43 UTC - in response to Message 1234593.  

The main issue I'm aware of with 29x.x is that they stop the GPU when the computer puts the monitor to sleep. The easiest way to fix it is to change your Windows settings so it never goes to sleep (turn it off manually when you want to).

I believe the problem has also been fixed in a newer driver, or you can roll back as you suggested. As someone said in this forum recently, never let Windows Update update your hardware drivers. If Windows says there's an update for hardware, tell it no, then check the hardware manufacturer's web site and get the new driver from there, if you even need it (it may have been released just to fix a problem you don't have anyway). Check for info and, especially, bug reports about the new driver in variouos places too.

I'm not sure, however, if this driver bug is related to the excessive spikes you got. It could be a coincidence.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1234598 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1235233 - Posted: 23 May 2012, 1:31:37 UTC - in response to Message 1234593.  


So you update the NVIDIA Drivers but not the Lunatics app? ;)
http://lunatics.kwsn.net/index.php?module=Downloads;catd=9
http://lunatics.kwsn.net/downloads/Lunatics%20ReadMev0.40.txt

The current app (inside/included in the Lunatics Installers) is:
Lunatics_x41g_win32_cuda32.exe


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1235233 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1235312 - Posted: 23 May 2012, 5:42:56 UTC

Ive just updated the driver to even later version 301.42. Let's see how it works out. Now it seems that other hosts are also experiencing same kind of problems with high spike count. You know the calculation failed when spike count is 30.

http://setiathome.berkeley.edu/workunit.php?wuid=995048708
http://setiathome.berkeley.edu/result.php?resultid=2449685357

Maybe the lunatics client should be cross verified with also latest -3 WHQL drivers so you would know in case there are any incompatibility or reliability issues.

Anyway to recalculate already the same Work unit with different driver versions?
I did not find such a feature in BOINC.

ID: 1235312 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1235377 - Posted: 23 May 2012, 10:47:55 UTC
Last modified: 23 May 2012, 10:56:13 UTC

Do upgrade to x41g as well please. Installer

And do a clean install of the 301.42 driver, if you didn't do that.

Unless you get invalids again, probably the driver to blame.
First mention where somebody started to get that type of problem with upgrade to that driver bracket, I think.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1235377 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1236066 - Posted: 24 May 2012, 19:58:27 UTC - in response to Message 1235377.  

I realized that my main GPU in slot 0 is working with a core clock of only 405MHZ. The Asus Factory clock runs at 900MHz. Maybe this could cause half of the results to be invalid. How to fix this problem?

I use the latest drivers 301.42. My additional identical 560Ti card in slot 1 (with is only 4x PCIe) is running at 900MHz and it may produce valid results. It also runs some 10 degrees cooler due to being lower in the tower case.

The 405 seems to be some power save mode. Also as a consequence the GPU load is peaking due to lower speed.
The memory clock runs at 1050 and shader at 1800 Mhz.

Any ideas?


ID: 1236066 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1236070 - Posted: 24 May 2012, 20:01:21 UTC - in response to Message 1236066.  

The GPU Core clock is jammed at 405 MHz. I tested it just with halting the calculation on GPUs. This must be a bug in the drivers.


ID: 1236070 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1236139 - Posted: 24 May 2012, 22:42:06 UTC - in response to Message 1236070.  

The GPU Core clock is jammed at 405 MHz. I tested it just with halting the calculation on GPUs. This must be a bug in the drivers.



The downclock is not a driver failure but a hardware coded protection, and the only way to get it back to the default speed is trough a reboot of the system.

There are several things that can make a GPU to downclock, a hardware failure, a GPU running too hot, an error on the software, a bad or insuffient PSU, a failure on the PCIe slot, etc. but also, sometimes, it just happens with no apparent cause...

If that GPU is also throwing a lot of invalid tasks its more probable that it's failling or that the PSU is not beeing able to handle both GPUS... (but not necesarily has to be that).

In the particular case of the 560TIs, most of them come with a factory core voltage that is under nominal value to make them more "green", but that value is not allways enough for keeping them crunching 24/7...

ID: 1236139 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1236273 - Posted: 25 May 2012, 5:04:53 UTC - in response to Message 1236139.  

Hello,

Actually checked that it is the 560TI connected to the additional PCIe 4x slot. I checked that all the failed results were done on the 2nd unit systematcially.

The chipset is intel P35. I would not suspect the PSU it is a 650 W Corsair with dedicated connectors for 2 GPU cards. Maybe NVidia had some issue with the older chipset compatibility for PCIe?

The 2nd GPU is now runnning att correct speed 900Mhz after reboot.

Stderr output
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 560 Ti, 1023 MiB, regsPerBlock 32768
computeCap 2.1, multiProcs 8
clockRate = 1800000
Device 2: GeForce GTX 560 Ti, 1023 MiB, regsPerBlock 32768
computeCap 2.1, multiProcs 8
clockRate = 1800000
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
Device 2: GeForce GTX 560 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 560 Ti
Priority of process raised successfully
Priority of worker thread raised successfully
Cuda Active: Plenty of total Global VRAM (>300MiB).
All early cuFft plans postponed, to parallel with first chirp.

) _ _ _)_ o _ _
(__ (_( ) ) (_( (_ ( (_ (
not bad for a human... _)

Multibeam x38g Preview, Cuda 3.20

Legacy setiathome_enhanced V6 mode.
Work Unit Info:
...............
WU true angle range is : 0.399012
Cuda sync'd & freed.
Preemptively acknowledging a safe Exit on error->
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

Flopcounter: 14445619385601.471000

Spike count: 30
Pulse count: 0
Triplet count: 0
Gaussian count: 0
Worker preemptively acknowledging an overflow exit.->
called boinc_finish
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->

</stderr_txt>
]]>


http://setiathome.berkeley.edu/results.php?hostid=3299266&offset=0&show_names=0&state=4&appid=
ID: 1236273 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1236301 - Posted: 25 May 2012, 6:03:40 UTC

With only a 650 watt PSU that would be my first culprit, I would want at least 800 watts with 2 560ti cards.

ID: 1236301 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1236450 - Posted: 25 May 2012, 9:07:33 UTC - in response to Message 1236301.  

With only a 650 watt PSU that would be my first culprit, I would want at least 800 watts with 2 560ti cards.


Don't know exactly how much 2 GTX560Ti are drawing, but I still use a 650Watt
PSU for my QX9650 (@3.55GHz)+ 1 GTX480, which draws around 415Watts, according
to the Kill-a-Watt.

The PSU doesn't even get warm, 35C.


ID: 1236450 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1236521 - Posted: 25 May 2012, 14:12:57 UTC - in response to Message 1236450.  

Hello,

I am using UPM PM300 power meter. The power usage is between 405W and 425W. The power supply is not an issue. Majority of WUs with the 560Ti that had the speed halted on 405MHz were not correct. After reboot the situation seems better. Now there is a 3 day delay to wait for the same WU to be verified by other hosts.

The issue was definately the halt on 405MHz speed. The root cause for that you can only speculate. The history is that original driver update was triggered by Windows Update and then a few driver update from Nvidia. No reboot was done between. I recall previously having some problems with those Windows update originated driver updates for Nvidia.


ID: 1236521 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1236584 - Posted: 25 May 2012, 16:15:57 UTC - in response to Message 1236521.  

Hooahmentah. Have you upgraded your Lunatics installer? 560ti's are notoriously buggy on SETI but the upgrading the Lunatics app should solve this.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1236584 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1236681 - Posted: 25 May 2012, 19:27:06 UTC - in response to Message 1236584.  

I want to see first how the system behaves and stabilizes before upgrading from x38 to x41. The thing is that you dont get the results so fast. Some benchmarking tools to verify performance between different clients would be nice. Also the ability to recalculate the same WU would be beneficial too.
ID: 1236681 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1236685 - Posted: 25 May 2012, 19:33:41 UTC - in response to Message 1236681.  
Last modified: 25 May 2012, 19:46:16 UTC

Some benchmarking tools to verify performance between different clients would be nice. Also the ability to recalculate the same WU would be beneficial too.

Benchmarking programs and shortened test Wu's are available in the Lunatics downloads:

Test and Benchmark Tools

Claggy
ID: 1236685 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1238220 - Posted: 27 May 2012, 20:04:48 UTC - in response to Message 1236139.  

The 560TI got downclocked again to 405MHz....

Strange. Temperature is around 60 degrees Celsius so not too much. Could it be that the drivers cannot handle the 4x PCIe bus on the Intel P35 chipset?


ID: 1238220 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 1238382 - Posted: 28 May 2012, 4:59:49 UTC - in response to Message 1238220.  

The 560TI got downclocked again to 405MHz....

Keep GPU-Z running in an active window and that should keep the card from doing that.
me@rescam.org
ID: 1238382 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1238491 - Posted: 28 May 2012, 15:28:56 UTC - in response to Message 1238220.  

The 560TI got downclocked again to 405MHz....

Strange. Temperature is around 60 degrees Celsius so not too much. Could it be that the drivers cannot handle the 4x PCIe bus on the Intel P35 chipset?



Look in the sensors tab of GPU-Z what is the VDDC (core voltage) of your cards.
If it is under 1.05 then you will need to rise that values (using Afterburner or some other overclocking utility), mines were downcloking very often until Ive rised their voltages to 1.062.

About the PSU, the total Watts it's not the only thing to look at. You need to know how much of that Watts can go through the 12V line. Each 560TI needs 170Watts maximun, which means your PSU should be able to give at least 340Watts in the 12V line (or 29 Ampers), or better a bit more as the 12V also gives power to HDDs and other devices.
And that's if the PSU has only one rail, if it has more than one rail, you need to be sure that you conected the GPUs on different rails and that each rail is able to give the necessary power to each card...
ID: 1238491 · Report as offensive
Mark Lybeck

Send message
Joined: 9 Aug 99
Posts: 245
Credit: 216,677,290
RAC: 173
Finland
Message 1239456 - Posted: 1 Jun 2012, 14:48:50 UTC - in response to Message 1238491.  

Yep Asus has designed the Voltage to 1.025V.

I have not now seen downclocking for a while, but the correct result yield is far from perfect.

http://setiathome.berkeley.edu/results.php?hostid=3299266&offset=0&show_names=0&state=4&appid=


ID: 1239456 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1239470 - Posted: 1 Jun 2012, 15:21:15 UTC - in response to Message 1239456.  


If you don't want to increase the Voltage then try to decrease the MHz (e.g. to stock NVIDIA values or lower)
The errors (false signals) and downclock are symptoms of bad Voltage/MHz combination.

And/or update the Lunatics app (don't wait "to see first how the system behaves and stabilizes before upgrading from x38 to x41")
(Lunatics_x41g_win32_cuda32.exe was released long enough for all of us here to know it is stable.
Do you have evidence of contrary?
And I'm not of the kind that hurry to update everything, I stay at old versions of many programs)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1239470 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : lunatics GPU x38g + 296.10 Driver


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.