Core processor overheating

Questions and Answers : GPU applications : Core processor overheating
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile BobMiller

Send message
Joined: 24 Jul 08
Posts: 32
Credit: 11,041,077
RAC: 129
United States
Message 1921680 - Posted: 28 Feb 2018, 16:07:32 UTC

Running SETI causes overheating of some core processors resulting in emergency power off of my pc. I do not see any settings that I have selected that could be causing that problem.
ID: 1921680 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1921849 - Posted: 1 Mar 2018, 7:45:37 UTC

SETI places a lot more thermal stress on components than even the most serious gaming, thus it is important to keep all heat sinks and radiators free from the inevitable build up of dust. Also, many laptop cooling systems are "marginal" even when in prime condition and running normal software, but give them the load of running SETI (or other similar intensive computational software) they tend to get very hot and need extra cooling.
So two things to consider, clean all heat sinks etc of accumulated dust, or if the issue is with a lap top get one of those external alp top cooling pads that has a couple of fans to blast cool air onto the base of the machine.

There are other solutions, but they either involve additional software, or more significant dismantling.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1921849 · Report as offensive
Luc

Send message
Joined: 7 Jan 17
Posts: 4
Credit: 1,840,181
RAC: 5
Canada
Message 1925452 - Posted: 20 Mar 2018, 2:02:10 UTC - in response to Message 1921680.  

you can also set Options/Computing Preferences/Computing to not use 100% of the available CPU time. Try setting it to 80% and work your way up. I have a laptop and use the cooling pad option mentioned here as well as setting CPU usage to 98%
ID: 1925452 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1930692 - Posted: 18 Apr 2018, 15:08:57 UTC - in response to Message 1925452.  
Last modified: 18 Apr 2018, 15:24:08 UTC

I had the same problem. I have been running without incident for a couple years since buying an 8 core system with Nvidia 1070 GPU. Then in the last couple weeks, the system began to shut itself down only after a few minutes activity. I did try bumping it down to 80% for both CPU and CPU time; yet it still would shut down. Then I found an article about disabling the CPU Parking and I took those steps since I noticed multiple CPUs listed as parked in task manager -> resource monitor. (Windows 7). Seems like the system has stopped shutting down; and I am again back at 100% CPU and CPU Time in my computing preferences. Update: Soon as I enabled the task on the GPU, the system came down again. Checking now for any diagnostics I can run on it.
ID: 1930692 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 1930698 - Posted: 18 Apr 2018, 15:21:35 UTC

u can try https://efmer.com/ TThrottle to limit temperature ^^ it suspend tasks from boinc project app to do this .
ID: 1930698 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1930702 - Posted: 18 Apr 2018, 15:44:52 UTC - in response to Message 1930698.  
Last modified: 18 Apr 2018, 15:45:17 UTC

Thank you, I will try that. The interesting thing here is that when I enable all tasks but the one on the GPU, the system is fine, even at 100% / 100%. I found multiple mentions of the benchmark tool at this url: https://benchmark.unigine.com/heaven?lang=en ; and I will try both and post my results.
ID: 1930702 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1930733 - Posted: 18 Apr 2018, 20:09:25 UTC - in response to Message 1930702.  

The system finally stopped even making it to POST, so I've taken it to a local repair shop. I'll post an update once the root cause is identified / fixed.
Thus far their diags have ruled out memory and the power supply.
Thanks.
ID: 1930733 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1930749 - Posted: 18 Apr 2018, 21:32:14 UTC
Last modified: 18 Apr 2018, 22:01:51 UTC

What make/model of video card and power supply do you have? From what I see it appears your power supply is underpowered and overheating when under load, and/or perhaps the 12V rail(s) are not able to push enough amperage. My rule of thumb is system power draw should be max. 65% of the PSUs rating for 24x7 operation.
ID: 1930749 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1930974 - Posted: 20 Apr 2018, 1:42:53 UTC - in response to Message 1930749.  

What make/model of video card and power supply do you have? From what I see it appears your power supply is underpowered and overheating when under load, and/or perhaps the 12V rail(s) are not able to push enough amperage. My rule of thumb is system power draw should be max. 65% of the PSUs rating for 24x7 operation.


Hi. The Video card is an actual NVIDIA GeForce GTX 1070. The power supply is 650 watts; and this combination has been running for well over a year with no issue.
I got the computer back and it turns out my SATA cable running from my drive to the motherboard had a cut in it. That being replaced, the system comes back up fine; and as before, will run fine with all 8 cores in use at 100% / 100% in the Computing Preferences options. However, as soon as I enable the packet running on the GPU; the system is down within a minute. I have disabled multiple packets running against the GPU to see if it was specific to a particular packet; and the result was the same.
I am chasing this with Nvidia support currently, and will report back.
Thanks very much for the response.
ID: 1930974 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1930975 - Posted: 20 Apr 2018, 2:09:51 UTC - in response to Message 1930974.  

You're welcome. :^) I will await NVidia's determination, but in the interim if you haven't already I would install NTune or SpeedFan or equivalent to see that GPU's temperature when idle and when a task starts. I wonder if it's seriously overheating...
ID: 1930975 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1930976 - Posted: 20 Apr 2018, 2:20:28 UTC - in response to Message 1930975.  

Thank you for pointing that out - i'm surprised they didn't ask me to do that since it is an Nvidia specific tool. I'm downloading that now. Currently I am running a diagnostic they asked me to run for 60 minutes; available at http://freestone-group.com/download/VideoCardStabilityTestSetup.exe . The response was that if this crashes then the GPU is faulty. I will need to wait until this completes its run for an hour before taking any additional steps to ensure I don't cause a conflict from multiple symptoms.
Thanks again.
ID: 1930976 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1930977 - Posted: 20 Apr 2018, 3:03:21 UTC - in response to Message 1930976.  
Last modified: 20 Apr 2018, 3:04:32 UTC

Over sixty minutes elapsed without incident. Now I am installing and executing the tool as directed by support at this url:
https://www.techpowerup.com/download/techpowerup-gpu-z/
The directed steps are:
a. Download GPUz from http://www.techpowerup.com/downloads/2490/techpowerup-gpu-z-v0-8-3/ (Select the standard version)
b. Go to the sensors tab and check the box 'Log to File'.
c. Save the file on the Desktop. Keep using the app for some time and check.
d. Attach the log file to this support request

3. Windows system utility file as below:

a. Press Windows Logo Key + R.
b. Type msinfo32 and press Enter.
c. This will bring up the Microsoft System Information Utility, click File, then Save as.
d. When the Save As window appears, choose Desktop and save to your hard drive. You may give it any name you choose, but with a '.nfo' file extension
e. Once the file has been saved on your hard drive, attach it to this support request so that we may review your system configuration

I will take the above steps next and report back when I get any results back.
ID: 1930977 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1930999 - Posted: 20 Apr 2018, 7:10:24 UTC - in response to Message 1930977.  

One 650W power supply isn't as good as the next 650W power supply. Generic PSUs are never as good as brand PSUs, but then some brand PSUs are absolute crap anyway. At least check yours against the PSU Tier list, see if it's on there and where it ends up. Top tiers are better, bottom ones worse. The forgoing list will change every now and then, and will of course depend on the tester's experience. But the ones at the bottom will generally be the same.
ID: 1930999 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1931008 - Posted: 20 Apr 2018, 8:15:02 UTC - in response to Message 1930999.  
Last modified: 20 Apr 2018, 8:17:29 UTC

Thanks for that information. So first, I had the wattage wrong; it's 600. The model is a Thermaltake TR2-600NL2NC and apparently that is among the "Tier 7 / Worst" list. I'll deal with that as soon as possible.
Thanks again!
ID: 1931008 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1931015 - Posted: 20 Apr 2018, 10:02:24 UTC - in response to Message 1930976.  
Last modified: 20 Apr 2018, 10:04:31 UTC

Thank you for pointing that out - i'm surprised they didn't ask me to do that since it is an Nvidia specific tool.


You're welcome and me too; my initial thought that it would have been the first thing that they would have had you do.

I had a ThermalTake power supply once. Once. :^) They are not very good. The only three brands I will ever buy now are Corsair, eVGA and Seasonic. You will find these three almost universally as the go-to brands for everyone who has built many computers.
ID: 1931015 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1931171 - Posted: 21 Apr 2018, 2:16:12 UTC - in response to Message 1931015.  

Yes, and I've made note of those brand names and am going to get one of them because I prefer stability over low cost.
Next update, and this one is really throwing me. They told me to set up a new administrative user and try with that new account.
I have asked them to tell me what led them down that path, but it is actually working. I've processed 4 packets on the GPU in the last 10 minutes or so and the system is still up.
I'd be interested to know how a user's environment could cause what has been happening. I will post their response. Using the old account isn't critical or even important, yet I'm curious and I like to learn. Thanks again.
ID: 1931171 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1931179 - Posted: 21 Apr 2018, 2:49:01 UTC - in response to Message 1931171.  

This is beginning to be a detailed exploration, tasks that look more like what I'd do as a Unix Admin.
This was their response, and my result:
Generally this issue would appear if there are any corrupt files in the OS or registry.

From the details it seems more likely to be an issue with the user account.

Please use the System File Checker tool to repair missing or corrupted system files:

https://support.microsoft.com/en-in/help/929833/use-the-system-file-checker-tool-to-repair-missing-or-corrupted-system


Since this was Windows 7; I used the below option. I didn't find any errors.

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\windows\system32>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection did not find any integrity violations.

C:\windows\system32>
ID: 1931179 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1931251 - Posted: 21 Apr 2018, 15:12:21 UTC - in response to Message 1931179.  

So the system came down again, but it had processed a number of packets against the GPU. I fired it back up this morning and heard a series of beeps just before it came down; but too many had gone by for me to catch it the count. Yet the system came up with another of those hardware based reboot notices; again with no resolution. I'm back to monitoring with GPU-Z and an audio recorder set next to the PC to capture those beeps; which I'll review against the motherboard diagnostic notes. This has been quite the learning experience. Thanks all who have responded. I'll continue to post results until this is resolved.
ID: 1931251 · Report as offensive
Profile Jeff_Kloek
Avatar

Send message
Joined: 26 May 99
Posts: 14
Credit: 11,829,338
RAC: 0
United States
Message 1931389 - Posted: 22 Apr 2018, 16:46:55 UTC - in response to Message 1931251.  
Last modified: 22 Apr 2018, 17:09:29 UTC

Here's probably the final update. They suggested removing the VNC mirror driver (so I removed all the VNC products); followed by a reboot; after which the symptoms were the same. Then they suggested I re-install the chipset drivers for my motherboard. I did that last night and the system completed all the pending tasks against the GPU and is still going without any further issues. I re-installed VNC as well, and still no further crashes have occurred. Again, my thanks to all who responded with suggestions. The mobo is an ASUS ROG Crosshair V Formula-Z with AMD 8 core CPU (CPU: Socket 942 4000 Mhz 1375 Mv AMD FX-8370 microcode patch level 600822 8 core processor), with 32Gb ram (DDR3 1333Mhz Ram GMED-0005), for anyone who is interested; and the system's sole purpose is to process SETI tasks.
ID: 1931389 · Report as offensive

Questions and Answers : GPU applications : Core processor overheating


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.