Message boards :
Number crunching :
Limiting GPU temperature on NVIDIA graphic card on Linux
Message board moderation
Author | Message |
---|---|
Bent Jakobsen Send message Joined: 27 Jul 04 Posts: 2 Credit: 77,199 RAC: 0 |
I was looking for a way to limit the maximum temperature that my NVIDIA card get heated to, when crunching GPU units through boinc (here on the milkyway project). The reason that I wanted to limit the maximum temperature is two fold: 1.To limit the wear and tear on the remaining components 2.To limit the noice from the graphic card. This should, from my point of view, be done through boinc. But I did not anything to use for this purpose on Linux, so I wrote this bash script. I'll just upload it so if anyone want to use it, just do :) But it is not perfect, and is not optimized, so feel free to improve. And please do remember that you use it at your own risk, so don't come running to me when your graphic card get toasted. Best regards Bent -----BASH CODE BEGIN----- #!/bin/bash #-------------------------------------------------------------------------------------------------- # Version: 001 # # Description: To ensure that the NVIDIA temperature stay within a certain range, when running the milkyway GPU application # # Opperating System: This is made for Linux. # # Notes: # - This example is made for only one GPU (gpu:0) # - Requires to be run as root or an elevated account that is allowed to stop and restart processes # # Requirements/commands used: # nvidia-settings # awk # sed # pslist # grep # cat # kill # sleep # echo # cut # # NOTE: Please do check that you have all these commands before running it. # # Disclaimer: # I assume no responsibility use it at your own risk ... # So don't complain about anything that may arise from using this.... # It is your own fault ;) # # And do please note that this script might have some unexpected bugs, as it is only some bash-code # which I have thrown together... # # License: Free - Do please modify if you want to do so # # Todo: The bash code should be optimized, and check for any unexpected failures... # -------------------------------------------------------------------------------------------------- GPUMAX=67 GPURESUME=56 SLEEPTIME=2 # GPUMAX: Maximum temperature # GPURESUME: Temperature where we can resume computing # SLEEPTIME: Amount of seconds between measurement COUNTER=0 while [ $COUNTER -lt 10 ]; do GPU0Temp=$( nvidia-settings -q [gpu:0]/GPUCoreTemp | grep '):' | awk '{print $4}' | sed 's/\.//' ) MPID="" MPID=$( pslist milkyway_0.24_x ) if [ -z "$MPID" ] then echo "GPU0 (NO MILKYWAY) = "$GPU0Temp else MILKYWAYPID=$( pslist milkyway_0.24_x | cut -d " " -f 1 ) # Get status: /proc/ STATUS="" STATUS=$( cat /proc/$MILKYWAYPID/status | grep "State:" | grep "stopped" ) fi if [ "$GPU0Temp" -gt "$GPUMAX" ] then # Temperature greather than allowed so pause GPU-thread kill -STOP $MILKYWAYPID RESUME=0 while [ $RESUME -lt 10 ]; do GPU0Temp1=$( nvidia-settings -q [gpu:0]/GPUCoreTemp | grep '):' | awk '{print $4}' | sed 's/\.//' ) if [ "$GPU0Temp1" -lt "$GPURESUME" ] then RESUME=20 fi echo "GPU0 (STOPPED) = "$GPU0Temp1 sleep $SLEEPTIME done kill -CONT $MILKYWAYPID else if [ "$GPU0Temp" -lt "$GPURESUME" ] then # Temperature less than allowed so resume GPU-thread - if it has been stopped if ! [ -z "$MPID" ] then # Check to see if process is stopped if ! [ -z "$STATUS" ] then kill -CONT $MILKYWAYPID echo "GPU0 (RESUMED) = "$GPU0Temp fi fi else if ! [ -z "$MPID" ] then # Check status: /proc/ if [ -z "$STATUS" ] then echo "GPU0 (RUNNING) = "$GPU0Temp else echo "GPU0 (COOLING) = "$GPU0Temp fi fi fi sleep $SLEEPTIME fi done |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
EVGA has a tool - "Precision" - for Windows that enables control of the fan on your cards; check their website to see if they have a version for Linux |
Bent Jakobsen Send message Joined: 27 Jul 04 Posts: 2 Credit: 77,199 RAC: 0 |
Hi jravin, Thanks for your message. Firstly I can not find a Linux version of precision – however I can perhaps find another tool to do the same if I wanted to. But basically I don't want to. Allow me to try to explain. Your way, as I see it, is to reduce NVIDIA slowdown threshold temperature, and thereby limiting my already â€slow†GeForce GTX 285 card. This would be a good way to fix a situation where we are having a lot of â€bad behaving applicationsâ€, and we wanted to ensure that no matter what, the temperature would not rise above a certain level, at expense off the noice level, and at the expense of the graphic card if the fan fails. But from my point of view it is the controlling application (read: boinc) which should ensure that it does not allow a â€bad behaving†application like milkyway to be run, when the environment is not within the acceptable limits specified by the local administrator (read: me). Current boinc does not allow such control, and therefore, in a way, is a â€bad behaving†application. Therefore if I want to run milkyway (or any other GPU boinc based application) I have to take on the responsibility to ensure the running environment, and for this situation I have made the script, so that milkyway is paused when the temperature is above the maximum specified temperature, and allowed to contiune computation when the temperature is below a certain temperature. So you see we are actually looking at the same issue but from two different points of view. Best regards Bent |
woodenboatguy Send message Joined: 10 Nov 00 Posts: 368 Credit: 3,969,364 RAC: 0 |
I have a number of GTX 285's. I say just let 'er rip. I get up to the high '80s low '90s and employ a solution someone here suggested. I have a fan mounted within the box blowing down the length of the three cards thereby increasing flow across the intake fans on the top and middle cards. It improved temps by 3 - 4C. Smoke 'em if you got 'em I say. What else are you going to do with all your money?!! Regards, |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Hi jravin, NO! The point of Precision ( or a similar tool for Linux) is that you control the fan speed, so you increase that, to cut the temp. Thus, you don't throttle back the card to avoid high temps; you up the fan speed for more cooling. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I tried a Google search for "linux fan speed control" and came up with this, which may be what you want: http://www.linuxhardware.org/nvclock/ Good luck! Jon |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
I've been wondering for a while if there was an windows app out there which would allow me to control temps, as opposed to precision which allows me to control speeds? I want to set a temp in the program, and it sets the speed of the fan to whatever it takes to make it hold (as best as possible, there's always a margin of error) to the set temp. As far as I know, there isn't such a program out there, does anyone know of one that works? |
w1hue Send message Joined: 4 Aug 00 Posts: 69 Credit: 5,492,898 RAC: 7 |
I've been wondering for a while if there was an windows app out there which would allow me to control temps, as opposed to precision which allows me to control speeds? I want to set a temp in the program, and it sets the speed of the fan to whatever it takes to make it hold (as best as possible, there's always a margin of error) to the set temp. As far as I know, there isn't such a program out there, does anyone know of one that works? Have you tried TThrottle? (Go to http://www.efmer.eu/boinc/). I have used it to keep CPU temps under control and it works great. Also supposed to work with GPUs, but my old NVIDIA GeForce 210 never gets hotter than 65 deg. C (well within its limits) when running SETI. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.