New cruncher problems

Message boards : Number crunching : New cruncher problems
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1704196 - Posted: 23 Jul 2015, 7:00:31 UTC

Having been running AMD processors for many years, and having become familiar with their "little wrinkles" I decided my latest cruncher would be based on an i7.
The basic spec is:
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Gigabyte Z97X-SOC DDR3 ATX Motherboard
16GB RAM
Corsair H80i GT cooler
Windows 7 Pro (64bit)

Currently no external GPU.
And as far as I'm aware it is clocked at the stock 4GHz.

The problem is that the system will run OK doing nothing, but as soon as I try to run S@H on all 8 cores (hyper threading on?) it will run for a few minutes then stops and restarts. (Hard re-boot). Currently running stock applications - it just hasn't run long enough to grab the Lunatics offerings.

If I try to run the iGPU it stops even sooner.
I've currently dropped it down to using 50% of the cores to get the system to bed in a bit.

My hand says the cooler is "not that warm", and likewise everything else is "just over room temperature".


Ideas please
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1704196 · Report as offensive
woohoo
Volunteer tester

Send message
Joined: 30 Oct 13
Posts: 972
Credit: 165,671,404
RAC: 5
United States
Message 1704198 - Posted: 23 Jul 2015, 7:21:17 UTC

by default my top one core multiplier is at 44. if i have problems then i lower to 43. in other words keep lowering it until it's stable. for some stupid reason i think my thermaltake air cooler outperforms my corsair h100i. you could try the stock cooler just for fun, but keep the clock speed down.
ID: 1704198 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1704200 - Posted: 23 Jul 2015, 7:25:26 UTC - in response to Message 1704196.  
Last modified: 23 Jul 2015, 7:32:58 UTC

My hand says the cooler is "not that warm", and likewise everything else is "just over room temperature".


I'd suggest downloading Core Temp so you can see just what the temperatures actually are.
What you're describing sound like an in correctly fitted heatsink. The other possibility is power supply issues, but I'd start by checking out the CPU temperatures.
If the heatsink isn't getting even warm, then that makes it's fitting highly suspect IMHO.

Core Temp home page.


EDIT- ignore Core Temp.
The current version wants to hijack your home page & install some other crap ware- and you can't install the programme unless you let it do that.


The other option is CPUID HWMonitor, or send me a Private Message, we can swap email addresses & I can send you an older version without the addon crapware.
Grant
Darwin NT
ID: 1704200 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1704201 - Posted: 23 Jul 2015, 7:25:38 UTC - in response to Message 1704196.  

Second reason (besides device overheating) could be device underpowering. PSU could have issues to power CPU working @max power consumption.
ID: 1704201 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1704202 - Posted: 23 Jul 2015, 7:26:51 UTC - in response to Message 1704196.  

My bet is that it IS overheating. Use coretemp or some other free s/w to check temps. My 4770k runs significantly hotter under HT than with HT turned off.
ID: 1704202 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1704205 - Posted: 23 Jul 2015, 7:38:52 UTC
Last modified: 23 Jul 2015, 7:39:53 UTC

Thanks guys - I'll have a look at "coretemp" when I get home.

It does appear to be running just now as work is returning and is being validated.

(The CPU didn't come with a cooler as I "traded" it for the H80i)


(And I thought it was only AMD FX CPUs that ran hot....)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1704205 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1704206 - Posted: 23 Jul 2015, 7:40:45 UTC - in response to Message 1704196.  
Last modified: 23 Jul 2015, 7:49:15 UTC

I'm guessing you are seeing a "CPU busy" message.

My i5 did that when I was not set to 100% CPU usage.

When GPU tasks stop/start it hits the CPU hard and shutting down for 20 seconds, then restart.

EDIT: If I recall correctly, it was if an upload was happening at the same time as GPU task was starting.
ID: 1704206 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1704207 - Posted: 23 Jul 2015, 7:41:53 UTC - in response to Message 1704205.  

(And I thought it was only AMD FX CPUs that ran hot....)

Any CPU will run hot without a correctly fitted heatsink.

With a closed water cooler on it, I would expect the water block to still become quite warm when the system is under full load, but no where near as hot as a conventional heatsink.
Grant
Darwin NT
ID: 1704207 · Report as offensive
den777
Volunteer tester

Send message
Joined: 29 Jun 13
Posts: 2
Credit: 3,006,789
RAC: 0
Russia
Message 1704216 - Posted: 23 Jul 2015, 8:26:58 UTC

Oh, Haswells is a real headache. I have i5-4670
1) Don't install Intel Extreme Tuning Utility or similar utilities from motherboard vendor. They are buggy. If you have one, uninstall it and reset BIOS to defaults.
2) Update BIOS if needed.
3) Disable TurboBoost.
4) Check temperature with CoreTemp or HWMonitor.
5) Check voltage. That CPUs are very sensitive to voltage. Bad PSU will cause instant shutdowns.
5) Install TThrottle and set it to 70-74 C.
6) If does not help, disable HyperThreading or limit numer of boinc tasks.
ID: 1704216 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1704217 - Posted: 23 Jul 2015, 8:38:17 UTC - in response to Message 1704216.  

Also check memory and memory voltages. My i7 (earlier Ivy Bridge 3770K), on a Gigabyle Z77X-D3H motherboard, wasn't picking up the automatic voltage settings for the Corsair RAM, with similar results (Thanks to Jason for that diagnosis).
ID: 1704217 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1704219 - Posted: 23 Jul 2015, 8:50:32 UTC - in response to Message 1704216.  

5) Install TThrottle and set it to 70-74 C.
6) If does not help, disable HyperThreading or limit numer of boinc tasks.

Only a work around for underlying issues.
Better to fix those issues.
Grant
Darwin NT
ID: 1704219 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1704225 - Posted: 23 Jul 2015, 10:05:14 UTC

It looks as if one of the many crashes has resulted in that cruncher having a number of tasks which have been corrupted during download, here's a typical one:

Name 28my15aa.25249.18472.438086664199.12.69.vlar_1
Workunit 1850791382
Created 22 Jul 2015, 17:33:04 UTC
Sent 22 Jul 2015, 21:44:38 UTC
Report deadline 14 Sep 2015, 2:44:20 UTC
Received 23 Jul 2015, 8:19:46 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -6 (0xfffffffffffffffa) Unknown error number
Computer ID 7619589
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 5.02 GFLOPS
Application version SETI@home v7 v7.00
Stderr output

<core_client_version>7.4.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ../seti_header.cpp
Line: 204


</stderr_txt>
]]>

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1704225 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1704226 - Posted: 23 Jul 2015, 10:18:57 UTC - in response to Message 1704225.  

it can be result of fily system corruption on unexpected previous reboot.
ID: 1704226 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1704252 - Posted: 23 Jul 2015, 13:19:34 UTC - in response to Message 1704226.  

Late to this party.

Never ran a naked CPU before.

I find the "warm to the touch" as accurate as a someone telling me they think they felt warm.

Nice to know something is wrong but what degree of wrong?

I always use closed loop radiators with any of my CPUs. I prefer overkill on cooling than undercooling.

I like SIV64x. Lets me see CPU temps, CPU volts, Memory, PSU volts, Fans, GPU Temps, GPU fans. Lot of information

the website is

http://rh-software.com/

Here's the direct link for the download

http://www.filecroco.com/download-system-information-viewer/download

I run both AMD and i7, AMD may run hotter but the i7 require MUCH MUCH more voltage.

Good luck and hope you get it stable

Zalster
ID: 1704252 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1704258 - Posted: 23 Jul 2015, 13:34:38 UTC - in response to Message 1704252.  

Nice bloke Ray too, awesome attention to detail and always wants to learn new things if something isn't right yet.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1704258 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1704259 - Posted: 23 Jul 2015, 13:43:55 UTC
Last modified: 23 Jul 2015, 13:45:30 UTC

It is NOT a "naked" CPU, but is fitted with a closed loop cooler - Corsair h80i - the radiator is warm, not hot. I'm used to how hot the radiators on my AMD systems run all the time, and this one is far cooler than those.

The system has only run for a few minutes at a time, so I've not had a chance to do any proper monitoring, but, as I said, appears to be running fairly well just now. This being the case I'm going to try and get one of the lightweight monitoring apps running on it so I can see what is actually happening under the cooler.

Once I know what is happening, apart from the spalt-crash-restart, I can begin to see what the real problem is. Both poorly seated cooler and low voltage come to mind.


Raistmer - the problem work units are all timed about time of one of the crashes, and a couple of "younger" tasks have run without the problem, so I'm putting it down to "crash while downloading", and living with the 20 or so errors for now :-(

As I said at the top - I know not a lot about these i7 CPUs, but I'm learning....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1704259 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1704277 - Posted: 23 Jul 2015, 14:48:25 UTC - in response to Message 1704202.  

My bet is that it IS overheating. Use coretemp or some other free s/w to check temps. My 4770k runs significantly hotter under HT than with HT turned off.


My experience |(though limited) is that an overheating cpu will shut down a pc, rather than reboot it, otherwise it can't cool down ;-)

P.
ID: 1704277 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1704292 - Posted: 23 Jul 2015, 15:49:18 UTC

With building my 2nd i5-4670K I had several issues with it freezing, restarting, or some other major issue. After 4 motherboards & 5 different sets of RAM I had swapped the CPU & then things were all well. The odd thing is the seemingly faulty CPU was fine in the other system.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1704292 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22188
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1704294 - Posted: 23 Jul 2015, 15:51:29 UTC

OK, so it is almost certainly a thermal issue.
I downloaded "CoreTemp" and installed it on one of my AMD systems and the errant i7 system
The AMD system is reporting temperatures around 40C
The i7 system is reporting temperatures around 90C

Now to dig for the cause.



(Hmm that's strange, a few moments ago the i7 temperatures dropped to about 5C, then leapt back up to ~90.

I wander if the fans on the H80i are running properly?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1704294 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1704298 - Posted: 23 Jul 2015, 15:55:48 UTC - in response to Message 1704294.  

OK, so it is almost certainly a thermal issue.
I downloaded "CoreTemp" and installed it on one of my AMD systems and the errant i7 system
The AMD system is reporting temperatures around 40C
The i7 system is reporting temperatures around 90C

Now to dig for the cause.



(Hmm that's strange, a few moments ago the i7 temperatures dropped to about 5C, then leapt back up to ~90.

I wander if the fans on the H80i are running properly?

If the CPU is at 90ºC I would suspect the the cooler isn't making good contact with the CPU.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1704298 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : New cruncher problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.