GPU Temp Monitoring

Questions and Answers : GPU applications : GPU Temp Monitoring
Message board moderation

To post messages, you must log in.

AuthorMessage
Spartana

Send message
Joined: 24 Apr 16
Posts: 99
Credit: 41,712,387
RAC: 25
United States
Message 2007411 - Posted: 15 Aug 2019, 1:27:00 UTC

Can anyone recommend a good GUI-based GPU temp monitoring app for Linux? I've been trying to find a decent one for days, and have had no luck. Even tried some of the coin mining SW just to see if I could take advantage of some of their utilities...it was an unpleasant experience. Trying to avoid terminal-only solutions.

Thanks,
Tony
ID: 2007411 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3297
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2007413 - Posted: 15 Aug 2019, 1:39:06 UTC
Last modified: 15 Aug 2019, 1:40:12 UTC

It's basic, but by installing the NVidia drivers (as you have the latest 430s) "NVidia X Server Settings" is present and it does show GPU temps., clock and fan speeds under Powermizer and Thermal Settings on the left. I tried other apps ie Green With Envy but none of them seemed ready for "prime time" so I stuck with the NVidia one as it works and I don't need long-term monitoring, just something to ensure that the cards are clocking and fans spinning up properly with nothing overheating.
ID: 2007413 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12966
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2007416 - Posted: 15 Aug 2019, 1:49:43 UTC
Last modified: 15 Aug 2019, 1:55:36 UTC

Both GKrellM and Psensors report the gpu temps by simply hooking into the reported temps coming from the Nvidia driver interface. They give you a Desktop monitoring program that tells you at a glance the system health of the host and all the gpu temps.


sudo apt install gkrellm


The display can be a lot more compact in configuration by collapsing all cores to a single CPU usage panel if desired.

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2007416 · Report as offensive
Spartana

Send message
Joined: 24 Apr 16
Posts: 99
Credit: 41,712,387
RAC: 25
United States
Message 2007418 - Posted: 15 Aug 2019, 2:01:27 UTC - in response to Message 2007413.  

It's basic, but by installing the NVidia drivers (as you have the latest 430s) "NVidia X Server Settings" is present and it does show GPU temps., clock and fan speeds under Powermizer and Thermal Settings on the left. I tried other apps ie Green With Envy but none of them seemed ready for "prime time" so I stuck with the NVidia one as it works and I don't need long-term monitoring, just something to ensure that the cards are clocking and fans spinning up properly with nothing overheating.


Ah. Thanks. I did look there first, but saw no useable data. I looked again after reading your post, and realized that the data is displaying on my Linux PC build, but not on my Linux servers where I had originally checked. I'll have to figure out why the servers are not displaying that data.
ID: 2007418 · Report as offensive
Spartana

Send message
Joined: 24 Apr 16
Posts: 99
Credit: 41,712,387
RAC: 25
United States
Message 2007419 - Posted: 15 Aug 2019, 2:06:38 UTC - in response to Message 2007416.  
Last modified: 15 Aug 2019, 2:07:29 UTC

Both GKrellM and Psensors report the gpu temps by simply hooking into the reported temps coming from the Nvidia driver interface. They give you a Desktop monitoring program that tells you at a glance the system health of the host and all the gpu temps.

sudo apt install gkrellm


The display can be a lot more compact in configuration by collapsing all cores to a single CPU usage panel if desired.



Thanks! That type of at-a-glance display is what I'm looking for. I did try Psensors previously, but had some issues with it not not picking up all devices despite my best efforts. I'll give GKrellM a try.
ID: 2007419 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12966
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2007426 - Posted: 15 Aug 2019, 2:26:50 UTC - in response to Message 2007419.  

All that is required as prerequisite is to install lm-sensors first.
sudo apt install lm-sensors


It's in the docs for both programs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2007426 · Report as offensive
Spartana

Send message
Joined: 24 Apr 16
Posts: 99
Credit: 41,712,387
RAC: 25
United States
Message 2007438 - Posted: 15 Aug 2019, 3:13:40 UTC - in response to Message 2007426.  

All that is required as prerequisite is to install lm-sensors first.
sudo apt install lm-sensors


It's in the docs for both programs.



I got it installed on the PC, and like it. That will probably be my go-to monitor. Unfortunately, I'm having the same issues with my dell servers not recognizing some of the nodes/devices through the program as with other monitoring programs (No GPU recognition and only about 1/3rd of the CPUs) as touched on above with Mr. Kevvy. Interesting since I never had any problems with Windows-based CPU/GPU monitoring on those servers. Getting that sorted will be a project for tomorrow.

Thanks again.
ID: 2007438 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12966
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2007442 - Posted: 15 Aug 2019, 3:31:03 UTC - in response to Message 2007438.  

Are you saying it isn't picking up all your gpu's? What does
sudo lshw -C display

show. Does that pick up all the gpus?
or
lspci | grep ' VGA '

or
nvidia-smi

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2007442 · Report as offensive
Spartana

Send message
Joined: 24 Apr 16
Posts: 99
Credit: 41,712,387
RAC: 25
United States
Message 2007514 - Posted: 15 Aug 2019, 15:40:13 UTC - in response to Message 2007442.  

Keith,

BOINC and the SETI apps are seeing and using the GPU just fine, just not any of the monitoring apps. Same issue on both servers I am currently crunching with, but the other server doesn't currently have a GPU installed. Below is what I got when I ran the commands you listed above while running a single GTX 980 on this server. Looks okay to me, but please let me know if you see anything amiss.

tony@PowerEdge-R410-2:~$ sudo lshw -C display
[sudo] password for tony:        
  *-display                 
       description: VGA compatible controller
       product: GM204 [GeForce GTX 980]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:03:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:50 memory:dd000000-ddffffff memory:c0000000-cfffffff memory:be000000-bfffffff ioport:ec80(size=128) memory:dc000000-dc07ffff
  *-display
       description: VGA compatible controller
       product: MGA G200eW WPCM450
       vendor: Matrox Electronics Systems Ltd.
       physical id: 3
       bus info: pci@0000:04:03.0
       version: 0a
       width: 32 bits
       clock: 33MHz
       capabilities: pm vga_controller bus_master cap_list rom
       configuration: driver=mgag200 latency=32 maxlatency=32 mingnt=16
       resources: irq:19 memory:d0000000-d07fffff memory:de7fc000-de7fffff memory:de800000-deffffff memory:c0000-dffff
tony@PowerEdge-R410-2:~$ lspci | grep ' VGA '
03:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
04:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
tony@PowerEdge-R410-2:~$ nvidia-smi
Thu Aug 15 11:24:16 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 00000000:03:00.0 Off |                  N/A |
| 41%   69C    P2   125W / 180W |   1278MiB /  4043MiB |     91%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     28398      C   ...x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90  1267MiB |
+-----------------------------------------------------------------------------+
tony@PowerEdge-R410-2:~$ 


Thanks again for the help...I'm still in learnig mode with these Linux machines.
ID: 2007514 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4135
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2007515 - Posted: 15 Aug 2019, 15:48:26 UTC

I would recommend Psensor over GKrellm.

GKrellm is quite resource intensive. i saw upwards of just 5-7% CPU just running it and nothing else on multi-GPU systems. the more GPUs or things it's monitoring, the worse it gets.

Psensor is quite light in comparison.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2007515 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12966
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2007543 - Posted: 15 Aug 2019, 19:04:36 UTC - in response to Message 2007515.  

I don't like the look or the interface of Psensor. My choice. I only see 1-2% cpu usage in Htop for the gkrellm process. I've never seen it go higher than 2%.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2007543 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12966
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2007545 - Posted: 15 Aug 2019, 19:07:32 UTC - in response to Message 2007514.  

Keith,

BOINC and the SETI apps are seeing and using the GPU just fine, just not any of the monitoring apps. Same issue on both servers I am currently crunching with, but the other server doesn't currently have a GPU installed. Below is what I got when I ran the commands you listed above while running a single GTX 980 on this server. Looks okay to me, but please let me know if you see anything amiss.

Thanks again for the help...I'm still in learnig mode with these Linux machines.

Don't know. My thinking is that the Matrox controller is confusing the monitoring apps. They expect to see only Nvidia, Intel or AMD I think. Probably would have to raise a bug issue with the developer that the 980 isn't being picked up even though the systeme sees and uses it just fine.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2007545 · Report as offensive
Spartana

Send message
Joined: 24 Apr 16
Posts: 99
Credit: 41,712,387
RAC: 25
United States
Message 2007657 - Posted: 16 Aug 2019, 5:26:33 UTC - in response to Message 2007545.  

Don't know. My thinking is that the Matrox controller is confusing the monitoring apps. They expect to see only Nvidia, Intel or AMD I think. Probably would have to raise a bug issue with the developer that the 980 isn't being picked up even though the systeme sees and uses it just fine.


Thanks for the help. I'll keep troubleshooting it, but will most likely just have to add it to the list of pitfalls when trying to force a server to act like a PC.
ID: 2007657 · Report as offensive

Questions and Answers : GPU applications : GPU Temp Monitoring


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.