NV GTX9xx series GPU memory downclock with compute tasks

Message boards : Number crunching : NV GTX9xx series GPU memory downclock with compute tasks
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1619290 - Posted: 27 Dec 2014, 3:39:04 UTC

There is a current report that current series NV drivers and NV GTX9 series GPU's are set to put the graphics card into a P2 state with memory underclocked.

I cant say I noticed the problem myself, I've been a bit swamped with hardware problems other than GPU's recently.

Anyone else know of this problem and if a workaround exits?

Regards
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1619290 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1619294 - Posted: 27 Dec 2014, 4:49:01 UTC - in response to Message 1619290.  

There's a whole thread over on Einstein on it (but you know that)

http://einstein.phys.uwm.edu/forum_thread.php?id=11044

However, I don't think it will affect us here much since the apps provided by lunatics are optimized to improve our output much beyond what those at Einstein are seeing with their adjustments.

That's my 2 cents...


Zalster

ps... Congrats on being the User of the day over there ;)
ID: 1619294 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1619298 - Posted: 27 Dec 2014, 4:59:50 UTC - in response to Message 1619294.  

Hi Zalster,


I certainly hope so:-)I'd hate to think I'd paid a lot of dosh for crippled GPU's.

As for UOD, surprised the hell outa me:-)

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1619298 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1620194 - Posted: 29 Dec 2014, 14:51:17 UTC - in response to Message 1619294.  

There's a whole thread over on Einstein on it (but you know that)

http://einstein.phys.uwm.edu/forum_thread.php?id=11044

However, I don't think it will affect us here much since the apps provided by lunatics are optimized to improve our output much beyond what those at Einstein are seeing with their adjustments.


Actually I don't see how app optimization can help with hardware put in P2 state... Optimization could prevent driver from chosing P2 state or what?
ID: 1620194 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1620225 - Posted: 29 Dec 2014, 16:26:27 UTC - in response to Message 1620194.  

You would know better than I Raistmer.

But they were talking about increase the speeds to gain a few minutes improvement on crunching time.

But I believe those were single instance per GPU. I'd have to go back and reread the thread.

There was also the question on stability and errors when overclocking.

Seeing how they are still using older cudas I wasn't following them very closely.

I linked the thread above if you wish to review what they were saying.


Zalster
ID: 1620225 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1620349 - Posted: 29 Dec 2014, 22:51:49 UTC - in response to Message 1620194.  

Hi Raistmer,

It does help a bit, at least with E@H tasks.

Using NVInspector to reset P2 memory to 3506 decreases task completion time by
up to 2 minutes per task.

S@H tasks run as per usual with or without the resetting of P2 state.

In any event it seems NVidia has crippled their cards for compute work unless specialised / optimised apps are utilised. Or NVI is used to reset P2.

How many projects that rely on distributed computing are aware of the P2 state decrease, or are able to field optimised apps to cope with it I wonder.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1620349 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 1620357 - Posted: 29 Dec 2014, 23:04:05 UTC - in response to Message 1619298.  

I'd paid a lot of dosh for crippled GPU's.


Don't even get me started on the 1/32 double precision performance ratio of the GTX9.

That's just plain shoddy.
ID: 1620357 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1620376 - Posted: 29 Dec 2014, 23:32:57 UTC - in response to Message 1620349.  

Hi Raistmer,

It does help a bit, at least with E@H tasks.

Using NVInspector to reset P2 memory to 3506 decreases task completion time by
up to 2 minutes per task.

Regards,


So, it's special tool, NVInspector, that allows to get away P2 state, not app optimization. Better to make it clear. Optimization (more precisely, global memory access optimization) just makes app less vulnerable to this issue, but in no way solve that issue. Or host running SETI app never entars in that P2 state?
ID: 1620376 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1620402 - Posted: 30 Dec 2014, 0:25:37 UTC

my 980's run at 1445mhz core and 7400mhz ram clock . they do not downclock for me so i m good if now i could just keep enough tasks in my queue
I came down with a bad case of i don't give a crap
ID: 1620402 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1620441 - Posted: 30 Dec 2014, 2:12:50 UTC - in response to Message 1620376.  

Raistmer, do you know why Nvidia chose to run distributed processing work in P2 state for the 900 series? Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state? The cards run normally in P1 state if they are doing ANYTHING OTHER than distributed processing tasks. I've looked around in the Nvidia CUDA forums and haven't found any discussions on this topic?

Thanks in advance for any insights.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1620441 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1620615 - Posted: 30 Dec 2014, 10:01:41 UTC - in response to Message 1620441.  

Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state?
Cheers, Keith


If I would know it I would not ask if SETI or any other computational task switches state back. Apparently they did not hence initial point of thread regarding optimization level is void.
Most probably some "feature" in new driver that will be removed later.
What I would expect is to lower freq when idle detected. Apparently it's not the case. Or both SETI and Einstein apps use memory too sparsely to trigger higher freqs (hard to believe though).
ID: 1620615 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 1620656 - Posted: 30 Dec 2014, 11:26:26 UTC

One thing that sets GPGPU tasks apart is that they don't produce any visible output.
ID: 1620656 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1620794 - Posted: 30 Dec 2014, 17:13:48 UTC - in response to Message 1620615.  

Also, do you know the mechanism by how they determine what the card is doing and then to force the cards into P2 state?
Cheers, Keith


If I would know it I would not ask if SETI or any other computational task switches state back. Apparently they did not hence initial point of thread regarding optimization level is void.
Most probably some "feature" in new driver that will be removed later.
What I would expect is to lower freq when idle detected. Apparently it's not the case. Or both SETI and Einstein apps use memory too sparsely to trigger higher freqs (hard to believe though).


Thanks for the reply. The one thing I am noticing is very low memory controller loading (1-2%) when running a mixture of SETI, MilkyWay and Einstein tasks at the same time. I haven't seen the high levels of memory controller loading mentioned in the thread in the Einstein forum. I wonder if that is the parameter that triggers dropping into P2 state. When I stop BOINC processing to watch a movie, I see the memory controller loading crank up and the memory is running at full spec speed along with the core clocks in P0 state. A mystery for now I guess. Just glad the tools for clocking P2 state memory at full speed are available.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1620794 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1620834 - Posted: 30 Dec 2014, 22:22:20 UTC - in response to Message 1620794.  

There are 2 different memory loads - global device memory and communication with system memory via PCIe bus. AFAIK that "load" field reflects second. And in optimized apps PCIe communication should be as small as possible. But P2 state reflects freq of GPUs own global memory. Hence - not really connected params.
ID: 1620834 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1620887 - Posted: 31 Dec 2014, 0:04:58 UTC - in response to Message 1620834.  

Raistmer, thanks for the explanation. Understand the difference now between onboard global memory and system memory with regard to the memory controller loading reported by typical GPU monitoring programs. Also explains the much higher memory controller loading for Einstein tasks since their app is not very optimized for GPU work and utilizes a lot more CPU time moving data across the PCIe bus into the GPU. I'm slowly learning this stuff at least.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1620887 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65749
Credit: 55,293,173
RAC: 49
United States
Message 1620899 - Posted: 31 Dec 2014, 0:19:14 UTC

I'm just hoping that eventually if I buy 4 GTX980 cards, that the cards don't turn out to be duds, otherwise I'd want to go no farther than the GTX780 card.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1620899 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1620933 - Posted: 31 Dec 2014, 1:43:43 UTC - in response to Message 1620899.  

Kind of depends on which projects you want to crunch for. Also you will realize a significant drop in electrical energy usage from previous generations. I found the move to the 970 very beneficial. More than doubled by RAC for MilkyWay tasks because of the doubling of the double-precision math capabilities over my previous 670's. I never felt the urge to move to the 700 series because of the increase in TDP and power usage over the 670's. It was a win-win for me with the 970's. Better RAC output, less heat output and less power draw from the wall.

Saw some news today about the release of a GTX960 product line at the end of January. Interested to find out where the pricing will be.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1620933 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 1620941 - Posted: 31 Dec 2014, 2:38:34 UTC

Hi Keith,

I'm afraid the GTX970 isn't the best choice to run MilkyWay on.

Nvidia has nerfed the double precision performance on the Maxwell GPUs.

(1/32 of single precision)

AMD did some nice Tahiti based cards a while ago, with 1/4 ratio for DP, especially the 7870 XT: http://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Southern_Islands_.28HD_7xxx.29_Series

One of these dedicated to MilkyWay might be a good choice.

Since these cards are a bit older, ebay might be good source.

Cheers
Hans
ID: 1620941 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1620956 - Posted: 31 Dec 2014, 3:18:50 UTC - in response to Message 1620941.  

Oh, I certainly know about the lousy FP64 performance of consumer level Nvidia products. But since MilkyWay is the only project I run that requires double precision math, Nvidia cards are an adequate compromise. Also, I am not too enthusiastic about the power consumption and heat from AMD cards. The drivers for AMD seem to be an ongoing battle among users that comment in the forums. If I was only running MilkyWay I probably would go with an AMD card. I am happy that I got a doubling of math efficiency from the move from the 670 cards to the 970 cards. Now if Nvidia would drop the price of the Teslas, Quadros and Titans I probably would move to better FP64 performance. Alas, my motherboards and cases are maxed out with dual X16 slots and not really any room to run an AMD card and provide adequate cooling. I really would have to build another cruncher and dedicate it to the AMD GPU platform. I don't think I could justify the additional power and cooling requirements for a third system. My two existing platforms have too much cooling requirements during the summer as is stands now. I shut them off at sundown when the output from the solar goes away. I run them now during the winter to heat the house from the power I banked during the summer. Thanks for the comment.

Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1620956 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 1620957 - Posted: 31 Dec 2014, 3:28:56 UTC

Alright.

There's always room for another noisy crunching box ;)

I shut off BOINC once it's getting hot enough for air conditioning.
My RAC takes a plunge, but at 30 cents per KWh here in Germany I have to pass.

Hans
ID: 1620957 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : NV GTX9xx series GPU memory downclock with compute tasks


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.