NVIDIA P0, P2 states and overclocking 1080, 1080Ti and VOLTA in Linux

Author	Message
petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1948006 - Posted: 5 Aug 2018, 12:17:13 UTC Hi, Here : https://drive.google.com/open?id=1Jl1tlN5V27odSgzPAzCotED4Z9yUmkwn is an utility to keep yopur card at P2 state. I have problems overclocking since overclocking 1080, 1080Ti and VOLTA affects both P0 and P2 states. These cards can not be locked to P0 in linux. When CUDA compute workload finishes the card jumps to P0 and crashes. Now with this program running there is always a compute work load in the background. The program runs a simple GPU kernel 10 times per second and keeps the driver thinking something is being calculated and thus keeps the card at P2. The performance hit is negligible. Now you can overclock. To run this program in the background start it with ./keepP2 device=N & and replace N with your gpu id. I run ./keepP2 device=0 & ./keepP2 device=1 & ./keepP2 device=2 & ./keepP2 device=3 & to make all four cards to stay at P2. -- petri33 To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1948006 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948030 - Posted: 5 Aug 2018, 16:10:56 UTC - in response to Message 1948006. Thanks for the tip Petri. I'll have to investigate.. I take it you are able to use much more aggressive overclocks with this utility running in the background compared to what you previously could use in P2. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948030 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948033 - Posted: 5 Aug 2018, 16:30:30 UTC Petri, I gather this utility is one you wrote for your specific environment? And the README file is the actual operand the program runs? If so, from that file it makes a call to CUDA 9.2 installed in /usr/local/cuda-9.2/bin/nvcc. That will only work for someone who has CUDA 9.2 installed and in the same directory location. Is this a correct assumption? Can you just modify the README file to point at where your own local CUDA installation is located? And does it only work with CUDA 9.2 or will plain CUDA 9.0 work? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948033 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1948053 - Posted: 5 Aug 2018, 20:05:46 UTC - in response to Message 1948033. The executable allows higher mem and grapics clocks since the GPU is kept from going to P0 where the overclock values that are neede for P2 would be too high. High end NV GPUs are forced to use P2 with CUDA compute loads. When switching WUs the GPU goes briefly to P0 and crashes without this new tool. In the readme is a sample of how to compile the source code to an executable. It can be modified to build an executable for other cuda versions too. The executable for linux is the file called keepP2. It can be saved and run on any directory path before overclocking. You may have to say once "chmod ugo+x keepP2" to make it executable. If you have one GPU just run ./keepP2 & on any terminal window. If you have two GPUs run ./keepP2 device=0 & ./keepP2 device=1 & etc. It may need cuda 92 but i doubt that since ldd says: linux-vdso.so.1 => (0x00007fff5038d000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5e3d888000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5e3d66a000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5e3d465000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5e3d0e3000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5e3cecc000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5e3cb01000) /lib64/ld-linux-x86-64.so.2 (0x00005619ebe86000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5e3c7f9000) To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1948053 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948066 - Posted: 5 Aug 2018, 22:35:11 UTC - in response to Message 1948053. Thanks for the clarification Petri. Understand now what the readme file is for. I just checked the executable for dependencies and was fine with just my CUDA9.0 system. Not sure I will use it yet. I have been able to use moderate overclocks on my cards to get mainly back to what they should be running in P0 and have avoided any crashes so far. Knock on wood. Most of my cards are air cooled and are already running into thermal limits with the natural GPU Boost 3.0 automatic overclocks. I really only have one system with mostly water cooled cards so that I could get away with a more aggressive overclock. I think I will try the utility on that system first. Tempting to not leave more horsepower in the tank if it is available. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948066 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948069 - Posted: 5 Aug 2018, 23:03:21 UTC Petri, may I ask what are the overclocks on your 1080's and 1080Ti now with the use of the utility? Right now I have a 40 Mhz gpu core overclock boost on my 1080's and 1080Ti and a 1000Mhz overclock boost on the memory. This gets the cards to run around 2025 - 2037Mhz on the core and around 100010 - 110010Mhz on the memory under compute load in P2 state. I think I could get more aggressive on the memory overclock with preventing the cards to going to P0 state. I think I will try 1500Mhz boost on the memory to start and then try 2000Mhz if it is still stable and doesn't crash. I ran a 12000Mhz memory clock when I could hold the 1080Ti card in P0 state under compute load in Windows with Nvidia Profile Inspector tool previously. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948069 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1948076 - Posted: 5 Aug 2018, 23:48:21 UTC - in response to Message 1948069. Hi, It is hot here in Finland this summer. I have not set the gpu MHz any higher than the default so the offset is 0, I have raised the memory clock by 976 MHz on 1080 and 1000MHz on 1080Ti. The utility helps my GPUs to run stable. Running now ... 1080Ti: graphics 1835MHz and 11016 mem, 81 degrees Celsius, air cooled 1080 : graphics 1784MHz and 10002 mem, 82 degrees Celsius, air cooled I need to dust the cards on Tuesday. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1948076 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1948081 - Posted: 6 Aug 2018, 0:31:36 UTC - in response to Message 1948076. 1080Ti: graphics 1835MHz and 11016 mem, 81 degrees Celsius, air cooled 1080 : graphics 1784MHz and 10002 mem, 82 degrees Celsius, air cooled That is hot. Grant Darwin NT ID: 1948081 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948088 - Posted: 6 Aug 2018, 0:54:52 UTC Yes, my 1080's and 1080Ti are Hybrid cards. They run around 42-46Â°C. for the 1080's and 50Â°C. for the 1080Ti. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948088 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1948134 - Posted: 6 Aug 2018, 12:26:29 UTC so memory overclock is more beneficial to the SETI work unit cruch times than gpu core overclock? i would have thought it would be all about core speed. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1948134 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1948143 - Posted: 6 Aug 2018, 13:15:03 UTC Not a linux question, but I thought that P0 state was more productive for the VRAM instead of P2 state. I have my 1080ti's in P0 via nvidia profile inspector, but can change them back any time. Meow? "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1948143 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1948153 - Posted: 6 Aug 2018, 14:03:24 UTC - in response to Message 1948143. Not a linux question, but I thought that P0 state was more productive for the VRAM instead of P2 state. I have my 1080ti's in P0 via nvidia profile inspector, but can change them back any time. Meow? It is but what he is saying (my interpretation of his statement) is that when the kernal is not kept busy it will cause the driver to crash??? Which would explain the computer crashes I've seen when the cards run out of work or when they are set to run only at certain times and they stop crunching suddenly. The system crashes and recovers. So, when using Windows probably not a problem since there are more than 1 work unit running at a time constantly but with the linux, since it's only running 1 at a time, the kernal will not be constantly busy and will crash. Thanks for the info Petri. ID: 1948153 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1948154 - Posted: 6 Aug 2018, 14:12:44 UTC - in response to Message 1948143. Hello kittyman, Yes P0 would be more productive, but there is no utility for Linux to make the card go to P0 permanently. As soon as a compute job begins the cards go to P2. When the job ends the cards go to P0 and crash with high overclocks. So I had to make an utility to keep the cards at P2 so that the card never jumps to P0. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1948154 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1948156 - Posted: 6 Aug 2018, 14:20:23 UTC - in response to Message 1948134. so memory overclock is more beneficial to the SETI work unit cruch times than gpu core overclock? i would have thought it would be all about core speed. The memory is the current limiter of performance. My recent software improvement (30-50%) in seti vlar calculations was achieved through reduced number of memory reads and writes. There is still a lot to do and gain. As of overcloking memory or graphics clocks all depends on how much temperature the card/cooling can handle and the benefit depends of the GPU generation and memory subsytem/architecture NVIDIA has chosen to implement. I can not overclock the processor during the summer but I can set the memory clocks at P2 to resemble the standard clocks at P0. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1948156 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1948201 - Posted: 6 Aug 2018, 19:38:17 UTC - in response to Message 1948153. Not a linux question, but I thought that P0 state was more productive for the VRAM instead of P2 state. I have my 1080ti's in P0 via nvidia profile inspector, but can change them back any time. Meow? It is but what he is saying (my interpretation of his statement) is that when the kernal is not kept busy it will cause the driver to crash??? Which would explain the computer crashes I've seen when the cards run out of work or when they are set to run only at certain times and they stop crunching suddenly. The system crashes and recovers. So, when using Windows probably not a problem since there are more than 1 work unit running at a time constantly but with the linux, since it's only running 1 at a time, the kernal will not be constantly busy and will crash. Thanks for the info Petri. not exactly. under computational loads, 10 series nvidia GPUs usually run in the P2 state. which is a lower state than P0. P0 is the highest with the highest clocks and you usually only get there under 3D rendering apps like gaming. with a 1080ti for example, P2 state has the memory clocks 500MHz lower than in P0 state. you can force this P0 state for all loads under windows with certain versions of Nvidia Inspector. Since petri is running linux, and there is no way to force P0 via normal means, it will select the defaul P2 state for computation loads. however, you can mimic the P0 state performance, by simply overclocking your memory by 500MHz. This however applies to both P2 and P0 states. So when the card switches tasks and briefly switches to P0 state, it will apply a 500MHz overclock to his P0 state also, which is effectively a +1000MHz on his P2 state, which in most cases is too much overclock and will cause instability and crashing. that's the gist of it. me may even be adding more than that. the state behavior is still a bit strange to me. while i see my 1060 cards using P2 state in both windows and linux, my 1050ti cards under linux report P0 state all the time while running SETI. i have not done anything special for it. it's just what's reported in nvidia-smi. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1948201 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1948202 - Posted: 6 Aug 2018, 19:40:40 UTC - in response to Message 1948156. so memory overclock is more beneficial to the SETI work unit cruch times than gpu core overclock? i would have thought it would be all about core speed. The memory is the current limiter of performance. My recent software improvement (30-50%) in seti vlar calculations was achieved through reduced number of memory reads and writes. There is still a lot to do and gain. As of overcloking memory or graphics clocks all depends on how much temperature the card/cooling can handle and the benefit depends of the GPU generation and memory subsytem/architecture NVIDIA has chosen to implement. I can not overclock the processor during the summer but I can set the memory clocks at P2 to resemble the standard clocks at P0. Petri so a memory overclock improves run times more than gpu core overclock? i usually leave my systems at stock clocks for better stability, but maybe i'll apply a slight bump to the memory. i've found that increasing memory speed doesnt increase power consumption nearly as much as increasing core clock speed. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1948202 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1948209 - Posted: 6 Aug 2018, 19:53:59 UTC Just a quick, little bit out in the weeds question, what is the issue running P0 in Linux on the latest 10x0 series cards? Is it a driver issue, an OS issue, or a software utility issue which won't allow it to be set? And, what is needed for a solution, Nvidia, Linus, or the utility developer to get on it? And the likelihood of it happening any time soon? Don't know much about it, other than I also thought I heard that P0 is always the best way to go, obviously if you can set it, that is. ID: 1948209 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948211 - Posted: 6 Aug 2018, 20:02:58 UTC - in response to Message 1948201. Stephen and others have also reported that 1050 cards and lower don't get penalized by P2 state for compute loads. I guess Nvidia thinks that they have so few SM running for compute that the cards will never run into voltage or thermal limits so give them a pass in the drivers. Wish they would apply the same logic for the higher series cards especially when they are water cooled. I think the only solution is if and when Nvidia puts the drivers to open-source so the Linux developers can make the tools to keep the drivers in P0 state for compute loads. ATI has open sourced their drivers. The earlier generations of Nvidia cards could be overclocked automatically like the Keplers and Maxwells. The Pascals are out of luck. Don't know if that is a business decision to restrict compute from consumer class cards to force enterprise and educational markets to purchase the Tesla and Quadro series cards. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948211 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1948212 - Posted: 6 Aug 2018, 20:07:29 UTC - in response to Message 1948211. Ahh, ok, gotcha Keith, thanks! ID: 1948212 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1948213 - Posted: 6 Aug 2018, 20:09:16 UTC - in response to Message 1948209. Just a quick, little bit out in the weeds question, what is the issue running P0 in Linux on the latest 10x0 series cards? Is it a driver issue, an OS issue, or a software utility issue which won't allow it to be set? And, what is needed for a solution, Nvidia, Linus, or the utility developer to get on it? And the likelihood of it happening any time soon? Don't know much about it, other than I also thought I heard that P0 is always the best way to go, obviously if you can set it, that is. It is a driver issue inflicted by Nvidia to prevent P0 state on the card if the drivers detect a compute load. Applies to any OS, Windows, Linux or MacOS. There are third party tools like Nvidia Profile Inspector that can force the card to stay in P0 state for compute loads but no such similar tools for Linux or MacOS. Since Nvidia is anti-Linux I doubt they will ever support the same utility in Linux since they won't release documentation to the open source community. Windows developers do have access to Nvidia documentation because they can sign a NDA and have much closer relationship with the Nvidia driver developers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1948213 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.