FYI to all Nvidia Crunches out there... Clock speed Problems

Message boards : Number crunching : FYI to all Nvidia Crunches out there... Clock speed Problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Tex1954
Volunteer tester

Send message
Joined: 16 Mar 11
Posts: 12
Credit: 6,654,193
RAC: 17
United States
Message 1109613 - Posted: 25 May 2011, 13:22:20 UTC

There is an ongoing problem with CUDA tasks and the Clock Rates being dropped in Nvidia cards I've written it up and the problem is with Vista and Win7 both.

What happens, is the clock rate gets dropped to conserve power/heat etc. and never returns to high speed again. This always happens with DUAL Nvidia cards installed and seems only magic prevents it from happening on it's own most of the time. Doesn't matter what power settings are set, performance mode seems to help, but not totally correct it. Snoozing or Suspending tasks is a 95% guarantee the clocks with drop and never regain full speed again.

I've informed Nvidia tech support and the forums.


http://forums.nvidia.com/index.php?s=9f29a996e0ac9d6ea44a506f6631f805&showtopic=200414&pid=1237460&st=0&#entry1237460

8-)

Tex1954
ID: 1109613 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1109616 - Posted: 25 May 2011, 13:40:41 UTC - in response to Message 1109613.  

This problem has recently been drawn to the attention of the SETI CUDA developers, and is in the process of being reported onwards to other CUDA-enabled BOINC projects. It seems to be related specifically to the release of nVidia drivers which will support the forthcoming v4 release of the CUDA run-time support files. The central BOINC library code isn't yet fully compatible with CUDA v4: the problem has been overcome with test programs, and should go away with the next round of application releases.

Unfortunately, v3 and earlier nVidia drivers aren't available for the very latest generation of nVidia cards, but if you have a card which can run with nVidia driver 266.58, that should avoid the problem.
ID: 1109616 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1109670 - Posted: 25 May 2011, 16:43:03 UTC - in response to Message 1109616.  

This problem has recently been drawn to the attention of the SETI CUDA developers, and is in the process of being reported onwards to other CUDA-enabled BOINC projects. It seems to be related specifically to the release of nVidia drivers which will support the forthcoming v4 release of the CUDA run-time support files. The central BOINC library code isn't yet fully compatible with CUDA v4: the problem has been overcome with test programs, and should go away with the next round of application releases.

Unfortunately, v3 and earlier nVidia drivers aren't available for the very latest generation of nVidia cards, but if you have a card which can run with nVidia driver 266.58, that should avoid the problem.


Interesting, Richard do you know if this is only effecting the 5 series cards or is it effecting others running the 275.61's? I downgraded to the 275.51's and don't seem to be having these issues.
Traveling through space at ~67,000mph!
ID: 1109670 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1109679 - Posted: 25 May 2011, 16:54:25 UTC - in response to Message 1109670.  
Last modified: 25 May 2011, 16:54:43 UTC

This problem has recently been drawn to the attention of the SETI CUDA developers, and is in the process of being reported onwards to other CUDA-enabled BOINC projects. It seems to be related specifically to the release of nVidia drivers which will support the forthcoming v4 release of the CUDA run-time support files. The central BOINC library code isn't yet fully compatible with CUDA v4: the problem has been overcome with test programs, and should go away with the next round of application releases.

Unfortunately, v3 and earlier nVidia drivers aren't available for the very latest generation of nVidia cards, but if you have a card which can run with nVidia driver 266.58, that should avoid the problem.

Interesting, Richard do you know if this is only effecting the 5 series cards or is it effecting others running the 275.61's? I downgraded to the 275.51's and don't seem to be having these issues.

I'm not involved in the detailled technicalities, just a messenger. I'll try and find out - or someone who knows might post.
ID: 1109679 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1109689 - Posted: 25 May 2011, 17:18:55 UTC - in response to Message 1109679.  
Last modified: 25 May 2011, 17:25:12 UTC

This problem has recently been drawn to the attention of the SETI CUDA developers, and is in the process of being reported onwards to other CUDA-enabled BOINC projects. It seems to be related specifically to the release of nVidia drivers which will support the forthcoming v4 release of the CUDA run-time support files. The central BOINC library code isn't yet fully compatible with CUDA v4: the problem has been overcome with test programs, and should go away with the next round of application releases.

Unfortunately, v3 and earlier nVidia drivers aren't available for the very latest generation of nVidia cards, but if you have a card which can run with nVidia driver 266.58, that should avoid the problem.

Interesting, Richard do you know if this is only effecting the 5 series cards or is it effecting others running the 275.61's? I downgraded to the 275.51's and don't seem to be having these issues.

I'm not involved in the detailled technicalities, just a messenger. I'll try and find out - or someone who knows might post.



OK I will. It's quite involved, but I'll try detail first then explain further if needed.

Certain new methods that Cuda4 drivers deal with memory & Cuda transfers are sensitive to being abrubtly terminated without warning. All Windows-Boinc-Cuda app releases to date use boincApi code for their exit code, given that Boinc needs to tell applications through this channel when to snooze/resume/exit etc, as well as when the worker needs to exit normally.

Symptoms directly pertaining to effects using Cuda 4 drivers with current Boinc-Cuda applications are primarily the 'sticky downclock' problem, but also other forms of unexplained erroring out.

There are other non-Cuda related symptoms visible across non-Cuda (CPU) applications as well, most visible being truncation or erasure of the stderr.txt contents, and less visible possibly checkpoint & result files as well.

These sorts of symptoms, being apparently related to how 'nicely' the program treats the active buffer transfers when the application shuts down, seemed to be statisically more common on lower bus/memory speed systems, probably as a result of the transfers etc taking longer (i.e. higher contention).

The trial solution in testing is to install exit code within boincAPI that 'asks' the worker thread (that feeds the Cuda device etc) to shut down 'nicely', so that it can quickly finish what it is doing & tidyup before being 'killed'. At present this seems effective at preventing the downclock problem & possibly the stderr/etc truncation symptoms as well, though we're poking at it to look for unexpected issues at this time. I've relayed as much information as I can to Berkeley & will leave it in their hands.

If you experience the downclock problems, there are currently 2 options I'm aware of:
- Downgrade to driver 266.58 which is not as senstivie to its tasks being summarily executed, or
- Determine if it's a situation where you absolutely need the fix now: That would only be a possiblity for this Project (Other projects don't have the fix yet & may not be even aware of the issue), and only under special circumstances, as it would involve pre-alpha testing unproven code. We are a bit overworked at the moment with V7 & other development considerations, So please don't expect a rush release of this uproven code.

In any case, high throughput hosts are statisically less susceptible to this problem, so It is quite possible many hosts don't see the symptoms appear even with newer drivers & existing applications.

HTH, Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1109689 · Report as offensive
Tex1954
Volunteer tester

Send message
Joined: 16 Mar 11
Posts: 12
Credit: 6,654,193
RAC: 17
United States
Message 1109864 - Posted: 26 May 2011, 4:43:20 UTC

Thanks to all the great folks for addressing this issue!

I am sure it is in capable hands now!!

8-)

Tex1954
ID: 1109864 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1109939 - Posted: 26 May 2011, 13:16:37 UTC

Thanks for the information guys. I never noticed the clock slow down per say on my 480, hence the question. I was seeing other differences mainly in the gaming arena that ultimately caused me to swap drivers again.
Traveling through space at ~67,000mph!
ID: 1109939 · Report as offensive
Sandman192
Volunteer tester

Send message
Joined: 26 Apr 07
Posts: 3
Credit: 1,856,758
RAC: 2
United States
Message 1110888 - Posted: 28 May 2011, 22:33:35 UTC
Last modified: 28 May 2011, 22:34:19 UTC

ID: 1110888 · Report as offensive
Profile Theramansi
Avatar

Send message
Joined: 25 Jun 04
Posts: 97
Credit: 39,577,723
RAC: 63
United States
Message 1112226 - Posted: 2 Jun 2011, 0:34:31 UTC - in response to Message 1109613.  

Dual cards only?

My GTS450 was dropping to about half clock speed. I was using the Nvidia 270.61 driver. I have the onboard 8300 disabled in the bios, so I don't think that the BM or the driver can see it.

I have reverted to 267.59 and haven't seen the problem again as of yet.


ID: 1112226 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1112253 - Posted: 2 Jun 2011, 4:15:56 UTC

Just got notified by the Nvidia Updater that 275.33 driver was available. WHQL certified and not beta. Downloading as I type. Will give it a shot. I've not been bothered at all by any downclocking with the 270.61 and 275.27 drivers on my two GTX 460's for some reason.

Keith

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1112253 · Report as offensive
Profile AI4FR
Avatar

Send message
Joined: 13 Apr 11
Posts: 57
Credit: 23,590,991
RAC: 0
United States
Message 1112264 - Posted: 2 Jun 2011, 5:42:46 UTC
Last modified: 2 Jun 2011, 5:43:42 UTC

So far so good here with the 580's. I have them running in SLI of course. I also have them overclocked slightly to 825mhz for the core clock and 2104mhz for the memory clock. I have been watching them using a program called MSI Afterburner. Even with repeated stops and starts, no downclocking issues have been noticed with driver 270.61.
ID: 1112264 · Report as offensive
Urglab

Send message
Joined: 3 Jun 08
Posts: 4
Credit: 1,160,006
RAC: 0
Netherlands
Message 1112393 - Posted: 2 Jun 2011, 17:24:55 UTC

Core just dropped to 405 mhz when I suspended computation and didn't go back up. It still happens with the 275.33 WHQL drivers.
ID: 1112393 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1112430 - Posted: 2 Jun 2011, 18:38:50 UTC
Last modified: 2 Jun 2011, 18:39:09 UTC

Card is:
GeForce GTS 450

I just looked to see what version I have:
NVIDIA UNIX x86_64 Kernel Module 260.19.36

No problems here with clock speed. :)

I wonder if I should upgrade anyway?
ID: 1112430 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1112509 - Posted: 2 Jun 2011, 22:20:29 UTC

No issues with it on my 480. Haven't bothered with them on the 250 yet.
Traveling through space at ~67,000mph!
ID: 1112509 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1112514 - Posted: 2 Jun 2011, 23:01:04 UTC - in response to Message 1112509.  

Just had a huge slowdown on my 560ti. CUDA apps that were taking at most 24 mins were taking almost 2hrs. Restart fixed that.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1112514 · Report as offensive

Message boards : Number crunching : FYI to all Nvidia Crunches out there... Clock speed Problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.