boinc crashing the system

Questions and Answers : Windows : boinc crashing the system
Message board moderation

To post messages, you must log in.

AuthorMessage
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865197 - Posted: 2 May 2017, 6:16:05 UTC

Hi all-

I have been a long-time runner of Seti@Home. I recently did a major upgrade of my computer system, using all new parts and software. I got Boinc installed and Seti@Home began running. And then it goes awry...

For whatever reason, within 5 minutes of seti@home starting to process, the system will simply shut down. It's literally as if someone pulls the plug from the power supply. The whole system just suddenly shuts off, and stays off.

The problem is only occurring when seti@home is processing. I can be running high-end graphical benchmark tests for hours, leave the system on overnight, and it's completely stable. But every time, within 5 minutes of seti@home starting, it crashes.

Thus i am certain this is not a heat or a hardware issue. On the Boinc boards, others have mentioned having this problem too (though it was years ago). Someone mentioned the GPU, so i tried disabling gpu, but the issue still occurs. If seti@home is disabled and i use a standard screensaver, there is no issue; it stays stable.

I am running Win10 Pro, Ryzen 5 1600x, Radeon RX470 graphics.

Does anyone have any clue as to what could cause this and how to fix? My best guess is a driver somewhere- if so, anyone know what stable driver to use? Thanks!

-Fizz
ID: 1865197 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15199
Credit: 251,884,345
RAC: 326,575
United Kingdom
Message 1865202 - Posted: 2 May 2017, 6:36:51 UTC

Two things - first a number of people have reported issues with the latest version of Windows 10 conflicting with the SETI screen saver - does the same thing happen if you disable all screen savers and run SETI without one?
Second - what driver are you using for your graphics card - more particularly are you using one that Microsoft delivered or are you using one from AMD? There have been a number of issues with the MS supplied graphics drivers not having all the components needed for computation.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1865202 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14242
Credit: 3,541,345
RAC: 2,008
Netherlands
Message 1865323 - Posted: 3 May 2017, 10:01:05 UTC
Last modified: 3 May 2017, 10:36:38 UTC

I can be running high-end graphical benchmark tests for hours, leave the system on overnight, and it's completely stable.
Running benchmark tests, no matter how graphics intensive they are cannot be compared to running Seti tasks on the GPU, because the load of the benchmark is in bursts, whereas Seti's load is sustained.
The same happens with using the CPU, whereas other programs run in burst or use just part of the CPU core(s), Seti's tasks will use all of the CPU core(s) in full load. Thus creating all kinds of instability in systems that aren't stable.

Thus i am certain this is not a heat or a hardware issue.
You can't say that, unless you measure temperatures and power output.
It could still be:
- a problem with the PSU not giving enough power out to all the hardware asking for it.
- a problem with a heat sink not aligned correctly.
- a problem with too much or too little thermal paste between the heat sink and the CPU.
- inadequate cooling on the CPU.
- a problem with the motherboard having a small short circuit that only shows at higher heat/power request.

BOINC and Seti by themselves will not shut down a computer as if someone pulls the plug from the power supply. They aren't capable of doing that.
But hardware that isn't powered or cooled correctly can do that.
Edit: Or a setting in the motherboard BIOS/UEFI to prevent the CPU from burning through if it's getting too hot.

If you want full help, a lot more information is needed, like:
1. did you build the system yourself, or was it a package?
If built yourself:
2. what brand and model motherboard is it?
3. how big is the power supply unit (PSU)?
4. crucially, what brand and model is the PSU? Especially in packaged computers they add a low power/bad brand PSU, as the store doesn't expect you to do power-asking things like Seti on it. A 600W PSU can so easily not give out more than 450W.
5. is the default cooling on the CPU, or some after-market cooling, and if so, what brand and model?
6. what kind of case is this all in?
7. does it have cable management?
8. how many case fans?
8. other electrical equipment on the same power outlet/extension cord as the computer?

You can go to a power supply calculator like this one and quickly calculate the minimum required power output the PSU must be able to handle. Just guessing at what's inside your computer, I come to a minimum of 529 Watt. The PSU must be able to output that. You may want to check on the PSU tier list what your PSU is rated at. If it is in the tier 5 list, replace it immediately, before it blows and takes your motherboard or CPU along with it.
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1865323 · Report as offensive
johnsmitthen

Send message
Joined: 2 May 17
Posts: 5
Credit: 0
RAC: 0
United Kingdom
Message 1865511 - Posted: 4 May 2017, 7:48:11 UTC - in response to Message 1865197.  

SETI@home puts a pretty constant 100% load on the CPU.May be there is a fluff blocking airflow to the CPU or through the ventilation slots.It may be worth opening the case and cleaning out any fluff.Check the fans or grilles are not obstructed.
ID: 1865511 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865526 - Posted: 4 May 2017, 12:47:18 UTC - in response to Message 1865202.  

Two things - first a number of people have reported issues with the latest version of Windows 10 conflicting with the SETI screen saver - does the same thing happen if you disable all screen savers and run SETI without one?

Yes, with the screensaver off, it will still crash shortly after it starts processing.

Second - what driver are you using for your graphics card - more particularly are you using one that Microsoft delivered or are you using one from AMD? There have been a number of issues with the MS supplied graphics drivers not having all the components needed for computation.

It has occured with both the Windows-supplied driver and the latest update from AMD.

At the moment, i have Boinc completely disabled, and the pc has remained stable for the last two days.


-Fizz
ID: 1865526 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865529 - Posted: 4 May 2017, 13:08:49 UTC - in response to Message 1865323.  
Last modified: 4 May 2017, 13:12:30 UTC

If you want full help, a lot more information is needed, like:
1. did you build the system yourself, or was it a package?
If built yourself:
2. what brand and model motherboard is it?
3. how big is the power supply unit (PSU)?
4. crucially, what brand and model is the PSU? Especially in packaged computers they add a low power/bad brand PSU, as the store doesn't expect you to do power-asking things like Seti on it. A 600W PSU can so easily not give out more than 450W.
5. is the default cooling on the CPU, or some after-market cooling, and if so, what brand and model?
6. what kind of case is this all in?
7. does it have cable management?
8. how many case fans?
8. other electrical equipment on the same power outlet/extension cord as the computer?


Yes, it's my own build. Bit of a hobby of mine for the last 20 years.

Motherboard is an Asus Prime B350-Plus. No overclocking- all settings are standard except that i disabled the onboard serial port.

The Power Supply is an EVGA Supernova 650 G2. 650 W output. According to the Newegg power supply calculator, the minimum i need would be 560W (including a secondary hdd connected while i migrate all my stuff). I researched the psu quite extensively, and this one got excellent reviews for reliability and power stability. Per your link below, it is a Tier 1 power supply.

The cpu cooling is using a Thermaltake ContacSilent12.

The case is a Corsair 100R. No extra case fans at the moment, but i've been running with the case open. Everything is cool to the touch. There is some cable management, and they are kept pretty well out of the the airflow. But the case is open at the moment anyways.

The only other things on the same extension cord as the computer are the monitor and speakers.

If it is a power supply issue, i could try disconnecting the hard drive and dvd-rom- that would save up a few watts, though that wouldn't be conclusive if it still crashed. Otherwise, i suppose i could just swap the power supply on the off-chance i got a defective one.


-Fizz
ID: 1865529 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865530 - Posted: 4 May 2017, 13:09:47 UTC - in response to Message 1865511.  

SETI@home puts a pretty constant 100% load on the CPU.May be there is a fluff blocking airflow to the CPU or through the ventilation slots.It may be worth opening the case and cleaning out any fluff.Check the fans or grilles are not obstructed.

This is an all-new build. There is no fluff anywhere in it. There is plenty of space for airflow. If there is a heat issue, then it's not from lack of air.

-Fizz
ID: 1865530 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865531 - Posted: 4 May 2017, 13:16:23 UTC - in response to Message 1865323.  

You can't say that, unless you measure temperatures and power output.
It could still be:
- a problem with the PSU not giving enough power out to all the hardware asking for it.
- a problem with a heat sink not aligned correctly.
- a problem with too much or too little thermal paste between the heat sink and the CPU.
- inadequate cooling on the CPU.
- a problem with the motherboard having a small short circuit that only shows at higher heat/power request.


In the years that i've been building, when i've had a heat issue, it has manifested itself with first video corruption and then the system freezing. I've never had a case where the system just turns off before. So i'm inclined to think it's more likely to be a power supply issue than heat.

That said, my only temperature gauge currently is in the bios, so i will get some software on and see how it's doing both at idle and load- see if it spikes during Seti@Home or anything.


-Fizz
ID: 1865531 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14242
Credit: 3,541,345
RAC: 2,008
Netherlands
Message 1865544 - Posted: 4 May 2017, 13:45:18 UTC - in response to Message 1865531.  

That said, my only temperature gauge currently is in the bios, so i will get some software on and see how it's doing both at idle and load- see if it spikes during Seti@Home or anything.
I prefer Speedfan, because it not only shows CPU and GPU temperatures, but also allows me to adjust all the fans - power them all the way down when the system is idle, but also when it is under load that the fans don't unnecessarily spin at 100%.

Another thing you may look into is the BIOS version. I see on Asus' website that it has had 7 BIOS updates already in the past 3 weeks, mostly to improve system stability: https://www.asus.com/Motherboards/PRIME-B350-PLUS/HelpDesk_Download/
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1865544 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865553 - Posted: 4 May 2017, 15:19:55 UTC - in response to Message 1865544.  

Another thing you may look into is the BIOS version. I see on Asus' website that it has had 7 BIOS updates already in the past 3 weeks, mostly to improve system stability:

Yep, updating the bios is one of the first things i did when i got it installed. Current version is 0609 dated april 19. Of course, Ryzen is so new the updates come fast and furious, so i should keep an eye on that.

I will check out Speedfan- like the idea of having fan control. In the meanwhile, i downloaded AMD System Monitor and CPUID HWMonitor. In addition, i took out the old fan/heatsink to doublecheck my install. I think it was fine- neither too much nor too little thermal paste.

So running since then, i have not had a crash (after about 10 minutes of watching it). It's weird, but when i watch the screen saver now, i swear it's slower than when i was on my old machine (Phenom II).

HWMonitor tells me that i'm peaking at temperatures of 101C (gah!) but only for brief moments. System Monitor is telling me that the cores are rarely all processing at the same time. As i type this, HWMonitor is telling me my cpu temperature is around 60C. Seems to me that's high for essentially idling. But this is supposedly a decent fan/heatsink combo. Hmmmm.


-Fizz
ID: 1865553 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865568 - Posted: 4 May 2017, 17:10:27 UTC - in response to Message 1865553.  

Well i think i'm onto something. Basically, it's an immature mainboard.

While i had not touched the bios screen, for some reason the system defaulted to running the cpu at .3GHz above spec. It's base speed is 3.6GHz but this mainboard decided that 3.9GHz was a better value. And it auto-tweaked to have higher voltages as well. So i can see how all cores running flat out, producing temperatures of over 100C might cause a crash.

I have now tweaked the bios to force the cpu back to its base 3.6GHz. Now, when seti@home runs, all cores / threads are running flat out, and yet the temperature is hovering at just over 83C. Quite a drop. And i've not experienced a crash since then.

So those of you who thought heat, appear to be right. But what i didn't know is why the heat would be so high (because I had assumed everything was running at stock speeds).

Anyways, i am going to let it run all afternoon. Hopefully i will not see a crash, in which case i think this is resolved. If not, i will be reporting in again later. Thanks to everyone for your input!

-Fizz
ID: 1865568 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14242
Credit: 3,541,345
RAC: 2,008
Netherlands
Message 1865572 - Posted: 4 May 2017, 18:04:05 UTC - in response to Message 1865568.  
Last modified: 4 May 2017, 18:24:25 UTC

83C? I would shut my system down if my CPU ran at those temps, and mine is an Intel.
The TjMax of the Ryzen CPU is 95C, just below that you first have throttling, when you reach that temperature, it'll try to invoke thermal shutdown.

Is the CPU fan running at maximum speeds?
Does the heat sink have one or two fans attached? Maybe two is better, one blowing into the heat sink block, the other sucking out of the heat sink block.
What kind of thermal paste did you use? The stuff coming with the cooler, or something else?

I used to have a Cooler Master Hyper 212 Evo on my CPU, but since last year switched to a Cooler Master Seidon 120V v2 water cooling and will in the future always add water cooling to my cases now. Near silent, and it's cooling a lot better than the big block HS + two fans ever did. Plus it all fits in my case, something that the Hyper Evo didn't do, I had to cut a hole in my case side to let the tops of the copper pipes stick through. ;-)

Temperatures were with air cooling 60 idle, high 70s under load and loud!
Now it's 34C idle and I haven't seen any core go much over 50C yet when under load.
(i5-2500K)

So last year I bought a new case, specifically geared for water cooling, plus the new water cooling.
Earlier this year I found out that my PSU was a tier 5 one, so I switched over to a Seasonic 80 Bronze Plus model, which is also fully modular (as in, I can take ALL cables off of it). That's also something I'll take along to future builds. :)

I'm no longer running Seti on my CPU though, just do burst runs on my Sapphire RX 470. Runs the Multibeam tasks in just under 4 minutes a piece, the GUPPIs in around 8 minutes. Max GPU temp 73C, with the GPU fan at 900 RPM.
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1865572 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865584 - Posted: 4 May 2017, 19:55:57 UTC - in response to Message 1865572.  
Last modified: 4 May 2017, 20:13:10 UTC

Well, 83C was the peak, most of the time it was in the mid-70's. After running for an hour just now it did peak to 89C. I agree with you- on the edge of being too hot. But at least it's in a safe range now. I do want to get those temperatures down, but i also want the thing to be relatively quiet.

It looks like the bios isn't doing its job- it has a fan controller than should spin up with temperature, but isn't going above a certain rpm. I need to look into that. When running seti, the fan is at 1700rpm. Max rpm is about 2000rpm, which sounds like a prop-plane taking off. :) And it doesn't make much difference in temperature from 1700. When idling, the fan is spinning at about 1300rpm, and my cpu temperature is jumping between 50C and 60C.

The thermal paste i was using is from the cooler (Thermaltake). But it was in limited supply, so when i remounted the heatsink, i used some from another cooler, a Zalman. As far as i know, it's decent stuff. As it happens, my old Phenom II Zalman cooler fits and seems to work just as well as the Thermaltake. The fan is smaller than the Thermaltake, but the Zalman is entirely copper. (The Zalman is much easier to take on and off, so that's why i went with it for these tests.)

So maybe i need to look into a water cooler- have never done that before. Not too keen on taking the whole mainboard out of the case though. Are there any that can be attached via the clip retention mechanism?

Another question for you- in HWMonitor, under the mainboard section it lists the cpu temperature, and it's quite low (max of 57C). Under the cpu it lists the package temperature, which is the value i have been quoting in my other posts. Any idea of what's the difference?


-Fizz
ID: 1865584 · Report as offensive
Profile Mr. KevvyCrowdfunding Project Donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 1717
Credit: 374,545,008
RAC: 508,733
Canada
Message 1865589 - Posted: 4 May 2017, 20:12:25 UTC
Last modified: 4 May 2017, 20:26:24 UTC

After having one leak and destroy a mainboard and CPU (voiding both warranties) I will never mess with water again...

I recommend to Google "Noctua NH-D15 SE-AM4"... Noctua makes very large highly-recommended dual-fan coolers if you have a full tower which will fit your Ryzen (it's an AM4 factor.) There are others as well. You will have to remove the board to get at the underside but would have to anyways with another cooler. Once installed they run as cool as water cooling (about 50C at full load) with zero maintenance required except for blowing the dust out once a month or so, no risk of water damage, and last for many years running 24x7. Because the fans are so large they also don't have to turn fast so it is very quiet.

There are lower-profile ones available for thinner cases as well. I have four of them in my farm at least and would not buy anything else.
“Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it's the only thing that ever has.”
--- Margaret Mead

ID: 1865589 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14242
Credit: 3,541,345
RAC: 2,008
Netherlands
Message 1865593 - Posted: 4 May 2017, 20:45:10 UTC - in response to Message 1865584.  
Last modified: 4 May 2017, 20:45:25 UTC

Mr. Kevvy's advice is good about the Noctua cooler.
The problem with most of the coolers out there is that they do make a lot of noise, and that's due to the size fans they have on them, the smaller the fan, the louder the noise as the fan has to spin faster to cool everything well enough. Put on a larger fan and the speed will go down, as the surface area that a larger fan sucks in is bigger, thus less RPMs, meaning less noise.

That Noctua can have two 140mm fans, which will cool things better at lower speeds, lower noise.
I see Thermaltake has similar packages, but none of them show if they're compatible with the AM4 slot. If you want to go that way, you'll have to ask them that.

Another question for you- in HWMonitor, under the mainboard section it lists the cpu temperature, and it's quite low (max of 57C). Under the cpu it lists the package temperature, which is the value i have been quoting in my other posts. Any idea of what's the difference?
Had to install that, to see what you mean. ;)
And I still don't know what the difference is... searching around it confuses a lot of people, who then advise to install Core Temp instead (as does AMD by the way). See http://www.overclock.net/t/1128821/amd-temp-information-and-guide for what all the names mean.
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1865593 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865610 - Posted: 4 May 2017, 21:42:30 UTC - in response to Message 1865593.  

Well, CoreTemp was a bust. It just returns a temperature of 0C for everything. It's probably not up to date with the latest cpu / mainboard perhaps.

Has anyone found a good answer to what temperature the Ryzen's should be at? I'm sure my heatsink is already better than the stock coolers, which supposedly should keep them sufficiently under any critical temperature.

I will look into the Noctua's. It can't be taller than my Thermaltake thought- it's already maxed out the inside of my case. Heh.

-Fizz
ID: 1865610 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14242
Credit: 3,541,345
RAC: 2,008
Netherlands
Message 1865623 - Posted: 4 May 2017, 23:39:27 UTC - in response to Message 1865610.  

There are a couple of other people running Ryzens and posting about it in Number Crunching. Seeing how your RAC is adequate (;-)) I suggest you ask over there, in one of their threads. This guy for instance manages to get his temps down to 38C, but do a read through that entire thread. (That's the From FX to Ryzen thread).
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1865623 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865714 - Posted: 5 May 2017, 14:09:41 UTC - in response to Message 1865623.  

Thanks for the links. Very interesting stuff. At least i'm not the only one seeing this temperature.

I happen to have a laser-temperature-gun (whatever they're called :) ) , and took some direct measurements. The temperature on the gpu matches what HWMonitor tells me, but i can't find anything close to a match on the cpu. The heatsink is cool, the back of the cpu is cool. Even when i touch the heatsink (the part that makes contact with the cpu), it's cool.

I am assuming that HWMonitor can only measure what the mainboard tells it, so if the mainboard is misreporting, then that would explain a lot. My current guess is that things are in fact sufficiently cool, and that the crashing wasn't heat, but simply an instability at 3.9GHz when all cores were going flat out.

Hopefully there will be a bios update to fix / confirm that.

-Fizz
ID: 1865714 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14242
Credit: 3,541,345
RAC: 2,008
Netherlands
Message 1865721 - Posted: 5 May 2017, 15:02:26 UTC - in response to Message 1865714.  

I am assuming that HWMonitor can only measure what the mainboard tells it, so if the mainboard is misreporting, then that would explain a lot.
The problem with all these measuring programs is that they don't measure the surface temperature of the items, but that the temperature sensor is underneath the hardware, or built inside - as the one per core. So something can feel cool to the touch, but be blistering hot on the other side.

My current guess is that things are in fact sufficiently cool, and that the crashing wasn't heat, but simply an instability at 3.9GHz when all cores were going flat out.
I suppose that the thermal shutdown does that. I'll keep my fingers crossed. :)

By the way, searching for a good temperature monitor capable of monitoring Ryzen, I found this: https://www.amd.com/en/technologies/ryzen-master
Probably wisest to use that. :)
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1865721 · Report as offensive
FizzProject Donor

Send message
Joined: 11 Jun 01
Posts: 19
Credit: 16,056,178
RAC: 17,534
Canada
Message 1865746 - Posted: 5 May 2017, 17:53:31 UTC - in response to Message 1865721.  

The problem with all these measuring programs is that they don't measure the surface temperature of the items, but that the temperature sensor is underneath the hardware, or built inside - as the one per core. So something can feel cool to the touch, but be blistering hot on the other side.


True, but i guess i've been primarily going on my own experience with this particular heatsink- when it was stressed it was noticably warm. But you're right- hard to draw full conclusions on that.

By the way, searching for a good temperature monitor capable of monitoring Ryzen, I found this: https://www.amd.com/en/technologies/ryzen-master Probably wisest to use that. :)


Excellent! I've got this up and running and just ran an hour long test. If this RyzenMaster is accurate then i'm in very good shape. After an hour of seti@home, all cores running flat out, the new software tells me that core temperatures maxxed out at 64C. Idling they're around 45C. The fan spun up somewhat, but not annoyingly so. (You can hear airflow, but no buzzing sounds.)

So i am feeling much better about things now. Thanks much for the input and resources.


-Fizz
ID: 1865746 · Report as offensive

Questions and Answers : Windows : boinc crashing the system


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.