NV GTX9xx series GPU memory downclock with compute tasks

Author	Message
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0	Message 1619290 - Posted: 27 Dec 2014, 3:39:04 UTC There is a current report that current series NV drivers and NV GTX9 series GPU's are set to put the graphics card into a P2 state with memory underclocked. I cant say I noticed the problem myself, I've been a bit swamped with hardware problems other than GPU's recently. Anyone else know of this problem and if a workaround exits? Regards Cliff, Been there, Done that, Still no damm T shirt! ID: 1619290 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1619294 - Posted: 27 Dec 2014, 4:49:01 UTC - in response to Message 1619290. There's a whole thread over on Einstein on it (but you know that) http://einstein.phys.uwm.edu/forum_thread.php?id=11044 However, I don't think it will affect us here much since the apps provided by lunatics are optimized to improve our output much beyond what those at Einstein are seeing with their adjustments. That's my 2 cents... Zalster ps... Congrats on being the User of the day over there ;) ID: 1619294 ·

cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0	Message 1619298 - Posted: 27 Dec 2014, 4:59:50 UTC - in response to Message 1619294. Hi Zalster, I certainly hope so:-)I'd hate to think I'd paid a lot of dosh for crippled GPU's. As for UOD, surprised the hell outa me:-) Regards, Cliff, Been there, Done that, Still no damm T shirt! ID: 1619298 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1620194 - Posted: 29 Dec 2014, 14:51:17 UTC - in response to Message 1619294. There's a whole thread over on Einstein on it (but you know that) http://einstein.phys.uwm.edu/forum_thread.php?id=11044 However, I don't think it will affect us here much since the apps provided by lunatics are optimized to improve our output much beyond what those at Einstein are seeing with their adjustments. Actually I don't see how app optimization can help with hardware put in P2 state... Optimization could prevent driver from chosing P2 state or what? ID: 1620194 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1620225 - Posted: 29 Dec 2014, 16:26:27 UTC - in response to Message 1620194. You would know better than I Raistmer. But they were talking about increase the speeds to gain a few minutes improvement on crunching time. But I believe those were single instance per GPU. I'd have to go back and reread the thread. There was also the question on stability and errors when overclocking. Seeing how they are still using older cudas I wasn't following them very closely. I linked the thread above if you wish to review what they were saying. Zalster ID: 1620225 ·

cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0	Message 1620349 - Posted: 29 Dec 2014, 22:51:49 UTC - in response to Message 1620194. Hi Raistmer, It does help a bit, at least with E@H tasks. Using NVInspector to reset P2 memory to 3506 decreases task completion time by up to 2 minutes per task. S@H tasks run as per usual with or without the resetting of P2 state. In any event it seems NVidia has crippled their cards for compute work unless specialised / optimised apps are utilised. Or NVI is used to reset P2. How many projects that rely on distributed computing are aware of the P2 state decrease, or are able to field optimised apps to cope with it I wonder. Regards, Cliff, Been there, Done that, Still no damm T shirt! ID: 1620349 ·

Hans Dorn Volunteer developer Volunteer tester Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0	Message 1620357 - Posted: 29 Dec 2014, 23:04:05 UTC - in response to Message 1619298. I'd paid a lot of dosh for crippled GPU's. Don't even get me started on the 1/32 double precision performance ratio of the GTX9. That's just plain shoddy. ID: 1620357 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1620376 - Posted: 29 Dec 2014, 23:32:57 UTC - in response to Message 1620349. Hi Raistmer, It does help a bit, at least with E@H tasks. Using NVInspector to reset P2 memory to 3506 decreases task completion time by up to 2 minutes per task. Regards, So, it's special tool, NVInspector, that allows to get away P2 state, not app optimization. Better to make it clear. Optimization (more precisely, global memory access optimization) just makes app less vulnerable to this issue, but in no way solve that issue. Or host running SETI app never entars in that P2 state? ID: 1620376 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1620402 - Posted: 30 Dec 2014, 0:25:37 UTC my 980's run at 1445mhz core and 7400mhz ram clock . they do not downclock for me so i m good if now i could just keep enough tasks in my queue I came down with a bad case of i don't give a crap ID: 1620402 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1620441 - Posted: 30 Dec 2014, 2:12:50 UTC - in response to Message 1620376. Raistmer, do you know why Nvidia chose to run distributed processing work in P2 state for the 900 series? Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? The cards run normally in P1 state if they are doing ANYTHING OTHER than distributed processing tasks. I've looked around in the Nvidia CUDA forums and haven't found any discussions on this topic? Thanks in advance for any insights. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1620441 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1620615 - Posted: 30 Dec 2014, 10:01:41 UTC - in response to Message 1620441. Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? Cheers, Keith If I would know it I would not ask if SETI or any other computational task switches state back. Apparently they did not hence initial point of thread regarding optimization level is void. Most probably some "feature" in new driver that will be removed later. What I would expect is to lower freq when idle detected. Apparently it's not the case. Or both SETI and Einstein apps use memory too sparsely to trigger higher freqs (hard to believe though). ID: 1620615 ·

Hans Dorn Volunteer developer Volunteer tester Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0	Message 1620656 - Posted: 30 Dec 2014, 11:26:26 UTC One thing that sets GPGPU tasks apart is that they don't produce any visible output. ID: 1620656 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1620794 - Posted: 30 Dec 2014, 17:13:48 UTC - in response to Message 1620615. Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? Cheers, Keith If I would know it I would not ask if SETI or any other computational task switches state back. Apparently they did not hence initial point of thread regarding optimization level is void. Most probably some "feature" in new driver that will be removed later. What I would expect is to lower freq when idle detected. Apparently it's not the case. Or both SETI and Einstein apps use memory too sparsely to trigger higher freqs (hard to believe though). Thanks for the reply. The one thing I am noticing is very low memory controller loading (1-2%) when running a mixture of SETI, MilkyWay and Einstein tasks at the same time. I haven't seen the high levels of memory controller loading mentioned in the thread in the Einstein forum. I wonder if that is the parameter that triggers dropping into P2 state. When I stop BOINC processing to watch a movie, I see the memory controller loading crank up and the memory is running at full spec speed along with the core clocks in P0 state. A mystery for now I guess. Just glad the tools for clocking P2 state memory at full speed are available. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1620794 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1620834 - Posted: 30 Dec 2014, 22:22:20 UTC - in response to Message 1620794. There are 2 different memory loads - global device memory and communication with system memory via PCIe bus. AFAIK that "load" field reflects second. And in optimized apps PCIe communication should be as small as possible. But P2 state reflects freq of GPUs own global memory. Hence - not really connected params. ID: 1620834 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1620887 - Posted: 31 Dec 2014, 0:04:58 UTC - in response to Message 1620834. Raistmer, thanks for the explanation. Understand the difference now between onboard global memory and system memory with regard to the memory controller loading reported by typical GPU monitoring programs. Also explains the much higher memory controller loading for Einstein tasks since their app is not very optimized for GPU work and utilizes a lot more CPU time moving data across the PCIe bus into the GPU. I'm slowly learning this stuff at least. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1620887 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65749 Credit: 55,293,173 RAC: 49	Message 1620899 - Posted: 31 Dec 2014, 0:19:14 UTC I'm just hoping that eventually if I buy 4 GTX980 cards, that the cards don't turn out to be duds, otherwise I'd want to go no farther than the GTX780 card. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 1620899 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1620933 - Posted: 31 Dec 2014, 1:43:43 UTC - in response to Message 1620899. Kind of depends on which projects you want to crunch for. Also you will realize a significant drop in electrical energy usage from previous generations. I found the move to the 970 very beneficial. More than doubled by RAC for MilkyWay tasks because of the doubling of the double-precision math capabilities over my previous 670's. I never felt the urge to move to the 700 series because of the increase in TDP and power usage over the 670's. It was a win-win for me with the 970's. Better RAC output, less heat output and less power draw from the wall. Saw some news today about the release of a GTX960 product line at the end of January. Interested to find out where the pricing will be. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1620933 ·

Hans Dorn Volunteer developer Volunteer tester Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0	Message 1620941 - Posted: 31 Dec 2014, 2:38:34 UTC Hi Keith, I'm afraid the GTX970 isn't the best choice to run MilkyWay on. Nvidia has nerfed the double precision performance on the Maxwell GPUs. (1/32 of single precision) AMD did some nice Tahiti based cards a while ago, with 1/4 ratio for DP, especially the 7870 XT: http://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Southern_Islands_.28HD_7xxx.29_Series One of these dedicated to MilkyWay might be a good choice. Since these cards are a bit older, ebay might be good source. Cheers Hans ID: 1620941 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1620956 - Posted: 31 Dec 2014, 3:18:50 UTC - in response to Message 1620941. Oh, I certainly know about the lousy FP64 performance of consumer level Nvidia products. But since MilkyWay is the only project I run that requires double precision math, Nvidia cards are an adequate compromise. Also, I am not too enthusiastic about the power consumption and heat from AMD cards. The drivers for AMD seem to be an ongoing battle among users that comment in the forums. If I was only running MilkyWay I probably would go with an AMD card. I am happy that I got a doubling of math efficiency from the move from the 670 cards to the 970 cards. Now if Nvidia would drop the price of the Teslas, Quadros and Titans I probably would move to better FP64 performance. Alas, my motherboards and cases are maxed out with dual X16 slots and not really any room to run an AMD card and provide adequate cooling. I really would have to build another cruncher and dedicate it to the AMD GPU platform. I don't think I could justify the additional power and cooling requirements for a third system. My two existing platforms have too much cooling requirements during the summer as is stands now. I shut them off at sundown when the output from the solar goes away. I run them now during the winter to heat the house from the power I banked during the summer. Thanks for the comment. Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1620956 ·

Hans Dorn Volunteer developer Volunteer tester Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0	Message 1620957 - Posted: 31 Dec 2014, 3:28:56 UTC Alright. There's always room for another noisy crunching box ;) I shut off BOINC once it's getting hot enough for air conditioning. My RAC takes a plunge, but at 30 cents per KWh here in Germany I have to pass. Hans ID: 1620957 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.