Message boards :
Number crunching :
NV GTX9xx series GPU memory downclock with compute tasks
Message board moderation
Author | Message |
---|---|
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
There is a current report that current series NV drivers and NV GTX9 series GPU's are set to put the graphics card into a P2 state with memory underclocked. I cant say I noticed the problem myself, I've been a bit swamped with hardware problems other than GPU's recently. Anyone else know of this problem and if a workaround exits? Regards Cliff, Been there, Done that, Still no damm T shirt! |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
There's a whole thread over on Einstein on it (but you know that) http://einstein.phys.uwm.edu/forum_thread.php?id=11044 However, I don't think it will affect us here much since the apps provided by lunatics are optimized to improve our output much beyond what those at Einstein are seeing with their adjustments. That's my 2 cents... Zalster ps... Congrats on being the User of the day over there ;) |
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
Hi Zalster, I certainly hope so:-)I'd hate to think I'd paid a lot of dosh for crippled GPU's. As for UOD, surprised the hell outa me:-) Regards, Cliff, Been there, Done that, Still no damm T shirt! |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There's a whole thread over on Einstein on it (but you know that) Actually I don't see how app optimization can help with hardware put in P2 state... Optimization could prevent driver from chosing P2 state or what? |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
You would know better than I Raistmer. But they were talking about increase the speeds to gain a few minutes improvement on crunching time. But I believe those were single instance per GPU. I'd have to go back and reread the thread. There was also the question on stability and errors when overclocking. Seeing how they are still using older cudas I wasn't following them very closely. I linked the thread above if you wish to review what they were saying. Zalster |
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
Hi Raistmer, It does help a bit, at least with E@H tasks. Using NVInspector to reset P2 memory to 3506 decreases task completion time by up to 2 minutes per task. S@H tasks run as per usual with or without the resetting of P2 state. In any event it seems NVidia has crippled their cards for compute work unless specialised / optimised apps are utilised. Or NVI is used to reset P2. How many projects that rely on distributed computing are aware of the P2 state decrease, or are able to field optimised apps to cope with it I wonder. Regards, Cliff, Been there, Done that, Still no damm T shirt! |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
I'd paid a lot of dosh for crippled GPU's. Don't even get me started on the 1/32 double precision performance ratio of the GTX9. That's just plain shoddy. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hi Raistmer, So, it's special tool, NVInspector, that allows to get away P2 state, not app optimization. Better to make it clear. Optimization (more precisely, global memory access optimization) just makes app less vulnerable to this issue, but in no way solve that issue. Or host running SETI app never entars in that P2 state? |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
my 980's run at 1445mhz core and 7400mhz ram clock . they do not downclock for me so i m good if now i could just keep enough tasks in my queue I came down with a bad case of i don't give a crap |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Raistmer, do you know why Nvidia chose to run distributed processing work in P2 state for the 900 series? Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? The cards run normally in P1 state if they are doing ANYTHING OTHER than distributed processing tasks. I've looked around in the Nvidia CUDA forums and haven't found any discussions on this topic? Thanks in advance for any insights. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? If I would know it I would not ask if SETI or any other computational task switches state back. Apparently they did not hence initial point of thread regarding optimization level is void. Most probably some "feature" in new driver that will be removed later. What I would expect is to lower freq when idle detected. Apparently it's not the case. Or both SETI and Einstein apps use memory too sparsely to trigger higher freqs (hard to believe though). |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
One thing that sets GPGPU tasks apart is that they don't produce any visible output. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? Thanks for the reply. The one thing I am noticing is very low memory controller loading (1-2%) when running a mixture of SETI, MilkyWay and Einstein tasks at the same time. I haven't seen the high levels of memory controller loading mentioned in the thread in the Einstein forum. I wonder if that is the parameter that triggers dropping into P2 state. When I stop BOINC processing to watch a movie, I see the memory controller loading crank up and the memory is running at full spec speed along with the core clocks in P0 state. A mystery for now I guess. Just glad the tools for clocking P2 state memory at full speed are available. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There are 2 different memory loads - global device memory and communication with system memory via PCIe bus. AFAIK that "load" field reflects second. And in optimized apps PCIe communication should be as small as possible. But P2 state reflects freq of GPUs own global memory. Hence - not really connected params. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Raistmer, thanks for the explanation. Understand the difference now between onboard global memory and system memory with regard to the memory controller loading reported by typical GPU monitoring programs. Also explains the much higher memory controller loading for Einstein tasks since their app is not very optimized for GPU work and utilizes a lot more CPU time moving data across the PCIe bus into the GPU. I'm slowly learning this stuff at least. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65768 Credit: 55,293,173 RAC: 49 |
I'm just hoping that eventually if I buy 4 GTX980 cards, that the cards don't turn out to be duds, otherwise I'd want to go no farther than the GTX780 card. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Kind of depends on which projects you want to crunch for. Also you will realize a significant drop in electrical energy usage from previous generations. I found the move to the 970 very beneficial. More than doubled by RAC for MilkyWay tasks because of the doubling of the double-precision math capabilities over my previous 670's. I never felt the urge to move to the 700 series because of the increase in TDP and power usage over the 670's. It was a win-win for me with the 970's. Better RAC output, less heat output and less power draw from the wall. Saw some news today about the release of a GTX960 product line at the end of January. Interested to find out where the pricing will be. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
Hi Keith, I'm afraid the GTX970 isn't the best choice to run MilkyWay on. Nvidia has nerfed the double precision performance on the Maxwell GPUs. (1/32 of single precision) AMD did some nice Tahiti based cards a while ago, with 1/4 ratio for DP, especially the 7870 XT: http://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Southern_Islands_.28HD_7xxx.29_Series One of these dedicated to MilkyWay might be a good choice. Since these cards are a bit older, ebay might be good source. Cheers Hans |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Oh, I certainly know about the lousy FP64 performance of consumer level Nvidia products. But since MilkyWay is the only project I run that requires double precision math, Nvidia cards are an adequate compromise. Also, I am not too enthusiastic about the power consumption and heat from AMD cards. The drivers for AMD seem to be an ongoing battle among users that comment in the forums. If I was only running MilkyWay I probably would go with an AMD card. I am happy that I got a doubling of math efficiency from the move from the 670 cards to the 970 cards. Now if Nvidia would drop the price of the Teslas, Quadros and Titans I probably would move to better FP64 performance. Alas, my motherboards and cases are maxed out with dual X16 slots and not really any room to run an AMD card and provide adequate cooling. I really would have to build another cruncher and dedicate it to the AMD GPU platform. I don't think I could justify the additional power and cooling requirements for a third system. My two existing platforms have too much cooling requirements during the summer as is stands now. I shut them off at sundown when the output from the solar goes away. I run them now during the winter to heat the house from the power I banked during the summer. Thanks for the comment. Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
Alright. There's always room for another noisy crunching box ;) I shut off BOINC once it's getting hot enough for air conditioning. My RAC takes a plunge, but at 30 cents per KWh here in Germany I have to pass. Hans |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.