Message boards :
Number crunching :
Seeing Linux GPU temperatures and getting alerts when things go south
Message board moderation
Author | Message |
---|---|
Joseph Stateson Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 |
I use Boinctasks for my remote systems and that tool does not report temperatures from Linux like it does for Windows. For some time, I have been using BT and have configured several "rules" to send a text message to my phone if it detacts a stuck task or temp too high. That was not possible on Linux until now. I have a python script at https://github.com/JStateson/BoincTasks that runs as a service under systemd and reports temperatures to boinctasks. If addtion, if the NVidia driver recommends a reboot to recover a "lost" GPU, then that script sends a text message alerting me and turns off GPU usage on boinc. Anyone is welcome to use this tool and suggestions for improvement would be nice. You may already be using an excellent temperature checking and reporting program. This script allows temps to show up on the boinctask display which is convenient for me. |
wujj123456 Send message Joined: 5 Sep 04 Posts: 40 Credit: 20,877,975 RAC: 219 |
Nice one. I am curious how many of you ever run into temperature problems... I use high air-flow case and so far haven't really seen any problem even with open air-cooling GPUs stacked next to each other. For all the years of gaming and running BOINC on and off, I've never had a GPU shutdown. I did have one card outright burn out in HTPC case two years ago, but it was a busted capacitor. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I've replaced 7 cards over the years. Most of those are due to burn out and were before the hybrids came out. Now I almost have all hybrids and that doesn't often as often as it used to. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, whenever I develop a new improved version of the software I run into temperature problems. I'm running on air and flying there in the vincinity of upper limits of cooling. Some times it just happens that even though my system has been running OK for some hours a temperature catastrophe hits when I'm away from my computer. One of my GPUs goes south and does not recover. It will either run slow or rapidly destroy my work queue. I could take a look at your solution. -- Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I agree with Z, the hybrids are perfect for crunching. With them you could easily keep the temps within a safe range even on hot & high humidity places like the one i live. The only problem i have with the hybrids is with their pumps, i had 2 of them fail about 1 per year. They are hard to find here but when changed all return to work fine. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I agree with Z, the hybrids are perfect for crunching. With them you could easily keep the temps within a safe range even on hot & high humidity places like the one i live. :) Here is not humid nor warm. Still my GPUs go south. Or because of that. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.