Questions and Answers :
GPU applications :
Core processor overheating
Message board moderation
Author | Message |
---|---|
BobMiller Send message Joined: 24 Jul 08 Posts: 32 Credit: 11,041,077 RAC: 129 |
Running SETI causes overheating of some core processors resulting in emergency power off of my pc. I do not see any settings that I have selected that could be causing that problem. |
rob smith Send message Joined: 7 Mar 03 Posts: 22534 Credit: 416,307,556 RAC: 380 |
SETI places a lot more thermal stress on components than even the most serious gaming, thus it is important to keep all heat sinks and radiators free from the inevitable build up of dust. Also, many laptop cooling systems are "marginal" even when in prime condition and running normal software, but give them the load of running SETI (or other similar intensive computational software) they tend to get very hot and need extra cooling. So two things to consider, clean all heat sinks etc of accumulated dust, or if the issue is with a lap top get one of those external alp top cooling pads that has a couple of fans to blast cool air onto the base of the machine. There are other solutions, but they either involve additional software, or more significant dismantling..... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Luc Send message Joined: 7 Jan 17 Posts: 4 Credit: 1,840,181 RAC: 5 |
you can also set Options/Computing Preferences/Computing to not use 100% of the available CPU time. Try setting it to 80% and work your way up. I have a laptop and use the cooling pad option mentioned here as well as setting CPU usage to 98% |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
I had the same problem. I have been running without incident for a couple years since buying an 8 core system with Nvidia 1070 GPU. Then in the last couple weeks, the system began to shut itself down only after a few minutes activity. I did try bumping it down to 80% for both CPU and CPU time; yet it still would shut down. Then I found an article about disabling the CPU Parking and I took those steps since I noticed multiple CPUs listed as parked in task manager -> resource monitor. (Windows 7). Seems like the system has stopped shutting down; and I am again back at 100% CPU and CPU Time in my computing preferences. Update: Soon as I enabled the task on the GPU, the system came down again. Checking now for any diagnostics I can run on it. |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 |
u can try https://efmer.com/ TThrottle to limit temperature ^^ it suspend tasks from boinc project app to do this . |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
Thank you, I will try that. The interesting thing here is that when I enable all tasks but the one on the GPU, the system is fine, even at 100% / 100%. I found multiple mentions of the benchmark tool at this url: https://benchmark.unigine.com/heaven?lang=en ; and I will try both and post my results. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
The system finally stopped even making it to POST, so I've taken it to a local repair shop. I'll post an update once the root cause is identified / fixed. Thus far their diags have ruled out memory and the power supply. Thanks. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3806 Credit: 1,114,826,392 RAC: 3,319 |
What make/model of video card and power supply do you have? From what I see it appears your power supply is underpowered and overheating when under load, and/or perhaps the 12V rail(s) are not able to push enough amperage. My rule of thumb is system power draw should be max. 65% of the PSUs rating for 24x7 operation. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
What make/model of video card and power supply do you have? From what I see it appears your power supply is underpowered and overheating when under load, and/or perhaps the 12V rail(s) are not able to push enough amperage. My rule of thumb is system power draw should be max. 65% of the PSUs rating for 24x7 operation. Hi. The Video card is an actual NVIDIA GeForce GTX 1070. The power supply is 650 watts; and this combination has been running for well over a year with no issue. I got the computer back and it turns out my SATA cable running from my drive to the motherboard had a cut in it. That being replaced, the system comes back up fine; and as before, will run fine with all 8 cores in use at 100% / 100% in the Computing Preferences options. However, as soon as I enable the packet running on the GPU; the system is down within a minute. I have disabled multiple packets running against the GPU to see if it was specific to a particular packet; and the result was the same. I am chasing this with Nvidia support currently, and will report back. Thanks very much for the response. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3806 Credit: 1,114,826,392 RAC: 3,319 |
|
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
Thank you for pointing that out - i'm surprised they didn't ask me to do that since it is an Nvidia specific tool. I'm downloading that now. Currently I am running a diagnostic they asked me to run for 60 minutes; available at http://freestone-group.com/download/VideoCardStabilityTestSetup.exe . The response was that if this crashes then the GPU is faulty. I will need to wait until this completes its run for an hour before taking any additional steps to ensure I don't cause a conflict from multiple symptoms. Thanks again. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
Over sixty minutes elapsed without incident. Now I am installing and executing the tool as directed by support at this url: https://www.techpowerup.com/download/techpowerup-gpu-z/ The directed steps are: a. Download GPUz from http://www.techpowerup.com/downloads/2490/techpowerup-gpu-z-v0-8-3/ (Select the standard version) b. Go to the sensors tab and check the box 'Log to File'. c. Save the file on the Desktop. Keep using the app for some time and check. d. Attach the log file to this support request 3. Windows system utility file as below: a. Press Windows Logo Key + R. b. Type msinfo32 and press Enter. c. This will bring up the Microsoft System Information Utility, click File, then Save as. d. When the Save As window appears, choose Desktop and save to your hard drive. You may give it any name you choose, but with a '.nfo' file extension e. Once the file has been saved on your hard drive, attach it to this support request so that we may review your system configuration I will take the above steps next and report back when I get any results back. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
One 650W power supply isn't as good as the next 650W power supply. Generic PSUs are never as good as brand PSUs, but then some brand PSUs are absolute crap anyway. At least check yours against the PSU Tier list, see if it's on there and where it ends up. Top tiers are better, bottom ones worse. The forgoing list will change every now and then, and will of course depend on the tester's experience. But the ones at the bottom will generally be the same. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
Thanks for that information. So first, I had the wattage wrong; it's 600. The model is a Thermaltake TR2-600NL2NC and apparently that is among the "Tier 7 / Worst" list. I'll deal with that as soon as possible. Thanks again! |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3806 Credit: 1,114,826,392 RAC: 3,319 |
Thank you for pointing that out - i'm surprised they didn't ask me to do that since it is an Nvidia specific tool. You're welcome and me too; my initial thought that it would have been the first thing that they would have had you do. I had a ThermalTake power supply once. Once. :^) They are not very good. The only three brands I will ever buy now are Corsair, eVGA and Seasonic. You will find these three almost universally as the go-to brands for everyone who has built many computers. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
Yes, and I've made note of those brand names and am going to get one of them because I prefer stability over low cost. Next update, and this one is really throwing me. They told me to set up a new administrative user and try with that new account. I have asked them to tell me what led them down that path, but it is actually working. I've processed 4 packets on the GPU in the last 10 minutes or so and the system is still up. I'd be interested to know how a user's environment could cause what has been happening. I will post their response. Using the old account isn't critical or even important, yet I'm curious and I like to learn. Thanks again. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
This is beginning to be a detailed exploration, tasks that look more like what I'd do as a Unix Admin. This was their response, and my result: Generally this issue would appear if there are any corrupt files in the OS or registry. From the details it seems more likely to be an issue with the user account. Please use the System File Checker tool to repair missing or corrupted system files: https://support.microsoft.com/en-in/help/929833/use-the-system-file-checker-tool-to-repair-missing-or-corrupted-system Since this was Windows 7; I used the below option. I didn't find any errors. Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\windows\system32>sfc /scannow Beginning system scan. This process will take some time. Beginning verification phase of system scan. Verification 100% complete. Windows Resource Protection did not find any integrity violations. C:\windows\system32> |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
So the system came down again, but it had processed a number of packets against the GPU. I fired it back up this morning and heard a series of beeps just before it came down; but too many had gone by for me to catch it the count. Yet the system came up with another of those hardware based reboot notices; again with no resolution. I'm back to monitoring with GPU-Z and an audio recorder set next to the PC to capture those beeps; which I'll review against the motherboard diagnostic notes. This has been quite the learning experience. Thanks all who have responded. I'll continue to post results until this is resolved. |
Jeff_Kloek Send message Joined: 26 May 99 Posts: 14 Credit: 11,829,338 RAC: 0 |
Here's probably the final update. They suggested removing the VNC mirror driver (so I removed all the VNC products); followed by a reboot; after which the symptoms were the same. Then they suggested I re-install the chipset drivers for my motherboard. I did that last night and the system completed all the pending tasks against the GPU and is still going without any further issues. I re-installed VNC as well, and still no further crashes have occurred. Again, my thanks to all who responded with suggestions. The mobo is an ASUS ROG Crosshair V Formula-Z with AMD 8 core CPU (CPU: Socket 942 4000 Mhz 1375 Mv AMD FX-8370 microcode patch level 600822 8 core processor), with 32Gb ram (DDR3 1333Mhz Ram GMED-0005), for anyone who is interested; and the system's sole purpose is to process SETI tasks. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.