Computer crashes a lot

Message boards : Number crunching : Computer crashes a lot
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 32172
Credit: 79,922,639
RAC: 181
Germany
Message 1573417 - Posted: 17 Sep 2014, 20:32:32 UTC - in response to Message 1573322.  



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.

I ran a couple Dual core machines for a while with two GPU cards. Whenever I tried running 2 GPU APs and 1 CPU AP everything was fine until the GPUs hit 2 Highly Blanked APs at the same time. Then the CPU spent a lot of time thrashing at 100%. If your system is the least bit unstable it will probably crash. I didn't have any problem running 1 AP, 1 MB, and 1 CPU AP. I would suggest you try running just 1 GPU AP at a time. Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.

Here is the ReadMe_AstroPulse_OpenCL_NV.txt;
For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*
Running multiple cards in a system requires freeing another CPU core...


Yes it had its reasons i`ve added this to the read me`s.
With each crime and every kindness we birth our future.
ID: 1573417 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 90
United States
Message 1573339 - Posted: 17 Sep 2014, 18:10:04 UTC

I think everyone else has you covered and what I'm about to say almost certainly isn't your problem, but I thought it might be worth mentioning in case you continue to have trouble even after freeing CPU resources to run your GPUs.

I had a machine that kept BSODing on me in the middle of crunching and I couldn't figure-out why. Nothing I did seemed to help. I ran memory diagnostics, chkdsk for hours, I re-configured BOINC, I cleaned, I cooled, I replaced the power supply... Nothing helped.

Then I thought of something that I wouldn't say had the intellectual bandwidth of a "hunch" but was something closer to desperation: I pulled and reseated the RAM. (that hadn't been done in several years)

My BSODs went away.
ID: 1573339 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 1573322 - Posted: 17 Sep 2014, 17:40:19 UTC - in response to Message 1573284.  
Last modified: 17 Sep 2014, 17:56:31 UTC



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.

I ran a couple Dual core machines for a while with two GPU cards. Whenever I tried running 2 GPU APs and 1 CPU AP everything was fine until the GPUs hit 2 Highly Blanked APs at the same time. Then the CPU spent a lot of time thrashing at 100%. If your system is the least bit unstable it will probably crash. I didn't have any problem running 1 AP, 1 MB, and 1 CPU AP. I would suggest you try running just 1 GPU AP at a time. Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.

Here is the ReadMe_AstroPulse_OpenCL_NV.txt;
For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*
Running multiple cards in a system requires freeing another CPU core...
ID: 1573322 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1573284 - Posted: 17 Sep 2014, 16:54:02 UTC - in response to Message 1573185.  



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.
ID: 1573284 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 32172
Credit: 79,922,639
RAC: 181
Germany
Message 1573185 - Posted: 17 Sep 2014, 12:49:40 UTC - in response to Message 1573076.  

Good morning, folks!

TBar, well, I'm not sure, but yes, lot of highly blanked APs lately. Could that be the reason or at least part of it? I thought one free core is enough for doing AP on GPU.

BTW: Do you guys think my commandline is ok? I remember reading that the computer may crash if the numbers are too high.


Its possible, yes.

Your AP settings are at max for this GPU but your CPU is slowing things down at times.

Did you try to run GPU only ?
With each crime and every kindness we birth our future.
ID: 1573185 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1573076 - Posted: 17 Sep 2014, 4:46:46 UTC

Good morning, folks!

TBar, well, I'm not sure, but yes, lot of highly blanked APs lately. Could that be the reason or at least part of it? I thought one free core is enough for doing AP on GPU.

BTW: Do you guys think my commandline is ok? I remember reading that the computer may crash if the numbers are too high.
ID: 1573076 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 1572879 - Posted: 16 Sep 2014, 20:51:25 UTC - in response to Message 1572806.  

I usually run one task of vLHC on CPU and two AP tasks on the GPU....
Any ideas?

By my count that setup requires 3 CPU cores...You have 2.
Does it crash when the GPU is working 2 Blanked APs at once?
ID: 1572879 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1572875 - Posted: 16 Sep 2014, 20:47:03 UTC

@Hal: I didn't even know that there is a windows event log. I just checked it and it shows 29 errors in the last 24 hours but I have no idea if (and if so, which) one of those has to do with the crash. I also couldn't find anything like an av scan or so, but to be honest, I don't really know how to read this log (yet).

@rob: When I'm at home and I watch my computer, temperatures are always ok. Cores are usually below 50 degrees and sometimes go up to 53 or 54, GPU usually is between 50 and 59 and I never saw more then 64 (at least not since I use "use sleep" in commandline). So I don't think that's the problem here.
ID: 1572875 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 18644
Credit: 416,307,556
RAC: 863
United Kingdom
Message 1572856 - Posted: 16 Sep 2014, 20:24:38 UTC

Clue is at the end of the penultimate line "caused by a thermal issue"

Loads of potential causes of thermal issues ranging from clogged fans and filters, excessive overclocking, failed fan, fan running too slow......

And more......
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1572856 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1572851 - Posted: 16 Sep 2014, 20:22:42 UTC

As a work around until this is sorted out you could configure the machine for auto logon. That was when it crashes it will log in and restart BOINC. Running control userpasswords2 is one way you can setup the system for auto logon.

The information WhoCrashed is giving you is located in the Windows Event log. You don't really need a 3rd party application to read it, but whichever way you prefer works.

That error doesn't point to anything specific. Kernel crashes can be tricky to lock down. If you look in the windows event log are there any event prior to the crash that may show what the system was doing. Perhaps an anti-virus scan or something along those lines was running?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1572851 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1572806 - Posted: 16 Sep 2014, 20:07:13 UTC
Last modified: 16 Sep 2014, 20:07:53 UTC

Since a few weeks my computer crashes pretty often. It doesn't happen every day but 3-4 times a week. I usually let my computer crunch while I'm at work or at night, so I'm not there (or not awake) when the crashes happen which means that the computer sits in the log on screen for hours and that's a waste of electricity and also lets my RAC shrink.

This is the computer I'm talking about:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7327094

I usually run one task of vLHC on CPU and two AP tasks on the GPU. I use the following commandline for AP:

-unroll 10 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -use_sleep


As long as the system doesn't crash everything is ok. Crunching speed is good and temperatures are rather low.
I have no idea what causes the crashes, so today, after coming home from work and seeing the windows log on screen once again, I installed WhoCrashed to analyze the crash. This is what I got:




Any ideas?
ID: 1572806 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Computer crashes a lot


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.