Computer crashes a lot

Message boards : Number crunching : Computer crashes a lot
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1572806 - Posted: 16 Sep 2014, 20:07:13 UTC
Last modified: 16 Sep 2014, 20:07:53 UTC

Since a few weeks my computer crashes pretty often. It doesn't happen every day but 3-4 times a week. I usually let my computer crunch while I'm at work or at night, so I'm not there (or not awake) when the crashes happen which means that the computer sits in the log on screen for hours and that's a waste of electricity and also lets my RAC shrink.

This is the computer I'm talking about:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=7327094

I usually run one task of vLHC on CPU and two AP tasks on the GPU. I use the following commandline for AP:

-unroll 10 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -use_sleep


As long as the system doesn't crash everything is ok. Crunching speed is good and temperatures are rather low.
I have no idea what causes the crashes, so today, after coming home from work and seeing the windows log on screen once again, I installed WhoCrashed to analyze the crash. This is what I got:




Any ideas?
ID: 1572806 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1572851 - Posted: 16 Sep 2014, 20:22:42 UTC

As a work around until this is sorted out you could configure the machine for auto logon. That was when it crashes it will log in and restart BOINC. Running control userpasswords2 is one way you can setup the system for auto logon.

The information WhoCrashed is giving you is located in the Windows Event log. You don't really need a 3rd party application to read it, but whichever way you prefer works.

That error doesn't point to anything specific. Kernel crashes can be tricky to lock down. If you look in the windows event log are there any event prior to the crash that may show what the system was doing. Perhaps an anti-virus scan or something along those lines was running?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1572851 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1572856 - Posted: 16 Sep 2014, 20:24:38 UTC

Clue is at the end of the penultimate line "caused by a thermal issue"

Loads of potential causes of thermal issues ranging from clogged fans and filters, excessive overclocking, failed fan, fan running too slow......

And more......
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1572856 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1572875 - Posted: 16 Sep 2014, 20:47:03 UTC

@Hal: I didn't even know that there is a windows event log. I just checked it and it shows 29 errors in the last 24 hours but I have no idea if (and if so, which) one of those has to do with the crash. I also couldn't find anything like an av scan or so, but to be honest, I don't really know how to read this log (yet).

@rob: When I'm at home and I watch my computer, temperatures are always ok. Cores are usually below 50 degrees and sometimes go up to 53 or 54, GPU usually is between 50 and 59 and I never saw more then 64 (at least not since I use "use sleep" in commandline). So I don't think that's the problem here.
ID: 1572875 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1572879 - Posted: 16 Sep 2014, 20:51:25 UTC - in response to Message 1572806.  

I usually run one task of vLHC on CPU and two AP tasks on the GPU....
Any ideas?

By my count that setup requires 3 CPU cores...You have 2.
Does it crash when the GPU is working 2 Blanked APs at once?
ID: 1572879 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1573076 - Posted: 17 Sep 2014, 4:46:46 UTC

Good morning, folks!

TBar, well, I'm not sure, but yes, lot of highly blanked APs lately. Could that be the reason or at least part of it? I thought one free core is enough for doing AP on GPU.

BTW: Do you guys think my commandline is ok? I remember reading that the computer may crash if the numbers are too high.
ID: 1573076 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1573185 - Posted: 17 Sep 2014, 12:49:40 UTC - in response to Message 1573076.  

Good morning, folks!

TBar, well, I'm not sure, but yes, lot of highly blanked APs lately. Could that be the reason or at least part of it? I thought one free core is enough for doing AP on GPU.

BTW: Do you guys think my commandline is ok? I remember reading that the computer may crash if the numbers are too high.


Its possible, yes.

Your AP settings are at max for this GPU but your CPU is slowing things down at times.

Did you try to run GPU only ?


With each crime and every kindness we birth our future.
ID: 1573185 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1573284 - Posted: 17 Sep 2014, 16:54:02 UTC - in response to Message 1573185.  



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.
ID: 1573284 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1573322 - Posted: 17 Sep 2014, 17:40:19 UTC - in response to Message 1573284.  
Last modified: 17 Sep 2014, 17:56:31 UTC



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.

I ran a couple Dual core machines for a while with two GPU cards. Whenever I tried running 2 GPU APs and 1 CPU AP everything was fine until the GPUs hit 2 Highly Blanked APs at the same time. Then the CPU spent a lot of time thrashing at 100%. If your system is the least bit unstable it will probably crash. I didn't have any problem running 1 AP, 1 MB, and 1 CPU AP. I would suggest you try running just 1 GPU AP at a time. Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.

Here is the ReadMe_AstroPulse_OpenCL_NV.txt;
For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*
Running multiple cards in a system requires freeing another CPU core...
ID: 1573322 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1573339 - Posted: 17 Sep 2014, 18:10:04 UTC

I think everyone else has you covered and what I'm about to say almost certainly isn't your problem, but I thought it might be worth mentioning in case you continue to have trouble even after freeing CPU resources to run your GPUs.

I had a machine that kept BSODing on me in the middle of crunching and I couldn't figure-out why. Nothing I did seemed to help. I ran memory diagnostics, chkdsk for hours, I re-configured BOINC, I cleaned, I cooled, I replaced the power supply... Nothing helped.

Then I thought of something that I wouldn't say had the intellectual bandwidth of a "hunch" but was something closer to desperation: I pulled and reseated the RAM. (that hadn't been done in several years)

My BSODs went away.
ID: 1573339 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1573417 - Posted: 17 Sep 2014, 20:32:32 UTC - in response to Message 1573322.  



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.

I ran a couple Dual core machines for a while with two GPU cards. Whenever I tried running 2 GPU APs and 1 CPU AP everything was fine until the GPUs hit 2 Highly Blanked APs at the same time. Then the CPU spent a lot of time thrashing at 100%. If your system is the least bit unstable it will probably crash. I didn't have any problem running 1 AP, 1 MB, and 1 CPU AP. I would suggest you try running just 1 GPU AP at a time. Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.

Here is the ReadMe_AstroPulse_OpenCL_NV.txt;
For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*
Running multiple cards in a system requires freeing another CPU core...


Yes it had its reasons i`ve added this to the read me`s.


With each crime and every kindness we birth our future.
ID: 1573417 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1573895 - Posted: 18 Sep 2014, 17:18:41 UTC - in response to Message 1573417.  



Did you try to run GPU only ?

I try this right now. Today the computer did not crash. But as I said, it didn't crash every day, so I have to try this for a few days before I can be sure.

I ran a couple Dual core machines for a while with two GPU cards. Whenever I tried running 2 GPU APs and 1 CPU AP everything was fine until the GPUs hit 2 Highly Blanked APs at the same time. Then the CPU spent a lot of time thrashing at 100%. If your system is the least bit unstable it will probably crash. I didn't have any problem running 1 AP, 1 MB, and 1 CPU AP. I would suggest you try running just 1 GPU AP at a time. Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.

Here is the ReadMe_AstroPulse_OpenCL_NV.txt;
For best performance it is important to free 2 CPU cores running multiple instances.
Freeing at least 1 CPU core is necessity to get enough GPU usage.*
Running multiple cards in a system requires freeing another CPU core...


Yes it had its reasons i`ve added this to the read me`s.

I gave this a try today and until now it worked, running fine for 11 hours without crashing. But will just running 1 AP task at a time on GPU really give the same RAC as running two tasks? I know that with MB running two tasks at a time is definitly faster. So it's different on AP? Would be great if it really is so, because then I could do Seti and vLHC without crashing. I would prefer this to running 2 tasks on GPU and nothing on CPU because I really like CERN and would like to contibute there a bit also.

Anyway, I have to test for a few days now if this setup is really stable and how the RAC looks like. Thx to anybody for helping me here!
ID: 1573895 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1573972 - Posted: 18 Sep 2014, 20:02:30 UTC
Last modified: 18 Sep 2014, 20:03:04 UTC

Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.


With each crime and every kindness we birth our future.
ID: 1573972 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1573987 - Posted: 18 Sep 2014, 20:17:00 UTC - in response to Message 1573972.  

Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.

The only advantage with a low end card is when running Blanked APs. Soon, the problem with Blanked APs will be history. I would like to see a comparison with a low end card running AP_v7.
ID: 1573987 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1573994 - Posted: 18 Sep 2014, 20:30:28 UTC

Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.


Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.


The only advantage with a low end card is when running Blanked APs. Soon, the problem with Blanked APs will be history. I would like to see a comparison with a low end card running AP_v7.


I'm a bit confused now.....
ID: 1573994 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1573996 - Posted: 18 Sep 2014, 20:32:09 UTC - in response to Message 1573987.  

Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.

The only advantage with a low end card is when running Blanked APs. Soon, the problem with Blanked APs will be history. I would like to see a comparison with a low end card running AP_v7.


Not true, it depends on the CPU not GPU.


With each crime and every kindness we birth our future.
ID: 1573996 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1574001 - Posted: 18 Sep 2014, 20:34:50 UTC - in response to Message 1573994.  

Your GTX 750 isn't going to gain anything by running 2 APs at a time anyway.


Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.


The only advantage with a low end card is when running Blanked APs. Soon, the problem with Blanked APs will be history. I would like to see a comparison with a low end card running AP_v7.


I'm a bit confused now.....


The 750 will benefit running 2 instances for sure.
No doubt.


With each crime and every kindness we birth our future.
ID: 1574001 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1574002 - Posted: 18 Sep 2014, 20:36:04 UTC - in response to Message 1573996.  
Last modified: 18 Sep 2014, 20:38:21 UTC

Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.

The only advantage with a low end card is when running Blanked APs. Soon, the problem with Blanked APs will be history. I would like to see a comparison with a low end card running AP_v7.


Not true, it depends on the CPU not GPU.

Have you tested multiple instances on a low end card with AP_v7? With the Single instance tuned to run at 98% load?
ID: 1574002 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1574003 - Posted: 18 Sep 2014, 20:36:05 UTC

The proportion of processing done by the GPU is inversely proportional to the amount of blanking. So for a very highly blanked task very little use is made of the GPU and lots of use is made of the CPU - thus a high performance GPU is little better (in terms of overall processing time) than a low end one.

(For low blanked AP tasks most of the work is done on the GPU, and thus a high-end GPU will show a marked improvement in overall processing time)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1574003 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1574004 - Posted: 18 Sep 2014, 20:37:02 UTC - in response to Message 1574002.  

Of course running 2 instances of AP on GPU is better for RAC.
Also more efficient.

The only advantage with a low end card is when running Blanked APs. Soon, the problem with Blanked APs will be history. I would like to see a comparison with a low end card running AP_v7.


Not true, it depends on the CPU not GPU.

Have you tested multiple instances on a low end card with AP_v7?


Yes, sure.
I tested 8 different NV GPU`s last week with AP7.


With each crime and every kindness we birth our future.
ID: 1574004 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Computer crashes a lot


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.