Message boards :
Number crunching :
Proclamation - memory speed is more important than shader count or gpu core clocks
Message board moderation
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I am going to make a statement. I have two almost identical systems. Both run the same motherboard, the same cpu and same system memory type and quantity. Both at the same cpu clocks and same system memory clocks. One system has two identical GTX 1070 cards and one GTX 1080 card. The other system has two identical GTX 1070 cards and one GTX 1070Ti card. The 1070Ti card has one less shader unit than the 1080 card. All cards have 8GB of gpu memory. All cards run at approximately the same gpu core clock of around 2Ghz. The 1070 cards have GDDR5 memory and run at 8Ghz memory clock. The 1080 card has GDDR5X memory. The GDDR5X memory runs at 11Ghz. The system with the GTX 1080 in it has a 5K RAC advantage over the system with the 1070Ti in it. Both systems run the Linux CUDA9.0 special app. GPU memory speed and memory bandwidth is more important to task time completion than gpu core clock. This is my assertion. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
GPU memory speed and memory bandwidth is more important to task time completion than gpu core clock. This is my assertion. Pretty sure Petri recently posted that one of his biggest GPU application speed ups was done by reducing the amount of GPU memory access required. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
GPU memory speed and memory bandwidth is more important to task time completion than gpu core clock. This is my assertion. But I'm pretty sure he was referring to his latest statically linked releases. Not the older zi3v apps. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
But I'm pretty sure he was referring to his latest statically linked releases. Not the older zi3v apps. But it does show that memory access plays a significant factor in computation times. The less memory access required, the faster the WU is processed. And by the same reasoning, the faster any memory accesses are, the faster computations will be. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Yes, that is exactly what I was trying to make a point of. Don't worry about the gpu core clock. Let the Nvidia GPU Boost 3.0 mechanism in the firmware of the card take care of the core clock. You can just run it stock and the card will boost the clock to whatever the thermal and power limits allow. But you are always going to get penalized by Nvidia in the driver for running compute load with the severe drop in memory clocks. That parameter is NOT boosted by GPU Boost 3.0. So whatever you can do to get the memory clock back to what it should be running in P0 state is the best thing for reducing task completion times. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50 |
I thought the shaders were 2432 (1070Ti) vs. 2560 1080) ? Humans may rule the world...but bacteria run it... |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I thought the shaders were 2432 (1070Ti) vs. 2560 1080) ? . . I think you are talking in CUDA cores while Keith was talking about CUs (the channels that the cuda cores are grouped in). Stephen ? |
mmonnin Send message Joined: 8 Jun 17 Posts: 58 Credit: 10,176,849 RAC: 0 |
Did you set the memory speeds in Linux? Default they do not run at 8 GBps but only at 7.6GBps for me. 1070 and 1070Ti in Linux. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Yes, that is exactly what I was trying to make a point of. Don't worry about the gpu core clock. Let the Nvidia GPU Boost 3.0 mechanism in the firmware of the card take care of the core clock. You can just run it stock and the card will boost the clock to whatever the thermal and power limits allow. But you are always going to get penalized by Nvidia in the driver for running compute load with the severe drop in memory clocks. That parameter is NOT boosted by GPU Boost 3.0. So whatever you can do to get the memory clock back to what it should be running in P0 state is the best thing for reducing task completion times.Keith, does this just apply to the 10x0 series cards, that's where the driver got borked? |
mmonnin Send message Joined: 8 Jun 17 Posts: 58 Credit: 10,176,849 RAC: 0 |
Yes, that is exactly what I was trying to make a point of. Don't worry about the gpu core clock. Let the Nvidia GPU Boost 3.0 mechanism in the firmware of the card take care of the core clock. You can just run it stock and the card will boost the clock to whatever the thermal and power limits allow. But you are always going to get penalized by Nvidia in the driver for running compute load with the severe drop in memory clocks. That parameter is NOT boosted by GPU Boost 3.0. So whatever you can do to get the memory clock back to what it should be running in P0 state is the best thing for reducing task completion times.Keith, does this just apply to the 10x0 series cards, that's where the driver got borked? The GPU boost works pretty similar with 9xx series Maxwell cards. Memory OC is not allowed in the P2 state by most apps. I think NV inspector can do it in Windows. Those cards can be flashed to whatever memory clock you want though unlike Pascal. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Did you set the memory speeds in Linux? Default they do not run at 8 GBps but only at 7.6GBps for me. 1070 and 1070Ti in Linux. Yes, I add some overclock back into P2 state so that the cards are running close to what they would run in P0 state if Nvidia didn't penalize us for compute loads. I add 600 Mhz to the memory clock to the 1070 so effective clock is 8200 Mhz. That is only 200 Mhz past stock P0 speed. You can use Nvidia Profile Inspector in Windows to turn off the CUDA P2 downclock but there is no such utility or ability to do that in Linux. So you have to overclock P2 state a bit. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Yes, that is exactly what I was trying to make a point of. Don't worry about the gpu core clock. Let the Nvidia GPU Boost 3.0 mechanism in the firmware of the card take care of the core clock. You can just run it stock and the card will boost the clock to whatever the thermal and power limits allow. But you are always going to get penalized by Nvidia in the driver for running compute load with the severe drop in memory clocks. That parameter is NOT boosted by GPU Boost 3.0. So whatever you can do to get the memory clock back to what it should be running in P0 state is the best thing for reducing task completion times.Keith, does this just apply to the 10x0 series cards, that's where the driver got borked? No the P2 compute load penalty is applied to all Nvidia cards excluding some 1050 cards or similar cards in previous generations. As soon as the shader count gets above 6 or so, they get penalized. So Kepler, Maxwell and Pascal suffer the the P2 compute load penalty. Simply because the video driver enforces it. Nvidia could change that if they wanted. But I think they want to continue forcing users to their Tesla and Quadro products for anyone doing compute loads. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I'll copy my message from our GPUUG forum about how to turn of CUDA P2 compute load penalty for Windows users.
Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
mmonnin Send message Joined: 8 Jun 17 Posts: 58 Credit: 10,176,849 RAC: 0 |
Did you set the memory speeds in Linux? Default they do not run at 8 GBps but only at 7.6GBps for me. 1070 and 1070Ti in Linux. Depending on the memory OEM on the card it might be able to OC quite high. I've had my 1070 in windows at +900 before. I currently have them both at stock 8GHz in Linux in P2 but there's only about 5 seconds difference between a 1070 and 1070Ti since the 1070 can OC the GPU higher. About 2:20 to 2:25 or so per task so not a lot of time to make a noticeable difference if I change the memory clocks. I'll have to experiment some more. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Yes, that is exactly what I was trying to make a point of. Don't worry about the gpu core clock. Let the Nvidia GPU Boost 3.0 mechanism in the firmware of the card take care of the core clock. You can just run it stock and the card will boost the clock to whatever the thermal and power limits allow. But you are always going to get penalized by Nvidia in the driver for running compute load with the severe drop in memory clocks. That parameter is NOT boosted by GPU Boost 3.0. So whatever you can do to get the memory clock back to what it should be running in P0 state is the best thing for reducing task completion times.Keith, does this just apply to the 10x0 series cards, that's where the driver got borked? i can only confirm that my 750ti (Maxwell) and 1050ti (Pascal) cards ran at P0 by default. My 1060's all ran P2 by default, and 1080ti's run P2 by default. I never checked what they were doing when i was running 760's as for memory speed. I guess it might only big see improvements on the less optimized apps. SoG and zi3v. on my systems running petri's latest iteration, changing memory speed does not have much effect MAYBE 1-2 seconds faster on WUs that are taking about 60-70 seconds on average (1080ti). my theory on why i'm not seeing the same improvements with increased mem clock is because the latest app just doesnt rely on the memory as much. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I just d/l Profile Inspector per your instructions, and got 2.13, NOT 2.1.3.9. It does NOT have the "CUDA - Force P2" line item under part 5. How can I get the version you used? Or another, that supports the change? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I just d/l Profile Inspector per your instructions, and got 2.13, NOT 2.1.3.9. Looks like the old link is not up on the latest. You always get the latest at the developer's Github repository. 2.1.3.19 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
You have to be careful about adding too much memory overclock or you can crash your system and trash all your work like I just did a couple of days ago. My 1070's wouldn't take a +1000Mhz memory overclock boost. The danger is that as each task unloads from the card, the card transitions back into P0 or what Linux calls P3 state from P2. The overclock is added to all P0 states including P0, so the 1070 card tries to run at 9000Mhz which it can't do and crashes. The solution is to either overclock more mildly like my 600Mhz boost or use Petri's newest KeepP2 application which runs a small compute load in the background on the card at all times and prevents it from returning to P0 state. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
mmonnin Send message Joined: 8 Jun 17 Posts: 58 Credit: 10,176,849 RAC: 0 |
I moved the 1070 from 8k back to Linux P2 default of 7.6k and I saw about a 4-5 second increase in run times. I bumped it back up to 8.1k this morning. Slow increments atm. The 1070 is just a couple seconds behind the 1070Ti since it boosts higher. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I've found that a 600Mhz boost of memory clock in P2 is entirely safe for 8200Mhz effective. Running 8400Mhz currently on my 1070's now with keepP2. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.